Which is the most cost-effective design?
Update the ingestion process to use Amazon Kinesis Data Firehose to save data to Amazon S3. Use a fleet of On-Demand EC2 instances that launches each night to perform the batch processing of the S3 data and terminates when the processing completes.
Update the ingestion process to use Amazon Kinesis Data Firehouse to save data to Amazon S3. Use AWS Batch to perform nightly processing with a Spot market bid of 50% of the On-Demand price.
Update the ingestion process to use a fleet of EC2 Reserved Instances behind a Network Load Balancer with 3-year leases. Use Batch with Spot instances with a maximum bid of 50% of the On-Demand price for the nightly processing.
Update the ingestion process to use Amazon Kinesis Data Firehose to save data to Amazon Redshift. Use an AWS Lambda function scheduled to run nightly with Amazon CloudWatch Events to query Amazon Redshift to generate the daily statistics.
Explanations:
While using Amazon Kinesis Data Firehose to save data to Amazon S3 is a good choice for streaming data, performing nightly processing on On-Demand EC2 instances does not take advantage of potential cost savings from using Spot instances or Batch processing. Additionally, this option does not address the scalability and cost efficiency of the nightly processing, making it less optimal.
This option leverages Kinesis Data Firehose to save data to S3, which is cost-effective for ingestion. By using AWS Batch with Spot instances at a bid of 50% of the On-Demand price, the company can significantly reduce costs for nightly processing while also ensuring scalability and flexibility, making it the most cost-effective design.
While this option uses Kinesis Data Firehose and Batch processing, committing to a fleet of EC2 Reserved Instances for ingestion goes against the goal of cost-effectiveness. Reserved Instances have a fixed cost and may not provide the necessary flexibility, especially when the statistical analysis is not mission-critical. Spot instances should be used for nightly processing to optimize costs.
Although using Kinesis Data Firehose to save data to Amazon Redshift is a valid approach, it introduces unnecessary complexity and cost for querying the data using AWS Lambda. The Lambda function incurs additional execution costs, and this architecture may not efficiently handle the nightly batch processing requirements, making it less suitable compared to using AWS Batch.