Which is the most cost-effective design?

A company ingests and processes streaming market data.The data rate is constant.A nightly process that calculates aggregate statistics is run, and each execution takes about 4 hours to complete.The statistical analysis is not mission critical to the business, and previous data points are picked up on the next execution if a particular run fails.The current architecture uses a pool of Amazon EC2 Reserved Instances with 1-year reservations running full time to ingest and store the streaming data in attached Amazon EBS volumes.On-Demand EC2 instances are launched each night to perform the nightly processing, accessing the stored data from NFS shares on the ingestion servers, and terminating the nightly processing servers when complete.The Reserved Instance reservations are expiring, and the company needs to determine whether to purchase new reservations or implement a new design.

Which is the most cost-effective design?

Update the ingestion process to use Amazon Kinesis Data Firehose to save data to Amazon S3. Use a fleet of On-Demand EC2 instances that launches each night to perform the batch processing of the S3 data and terminates when the processing completes.

Update the ingestion process to use Amazon Kinesis Data Firehouse to save data to Amazon S3. Use AWS Batch to perform nightly processing with a Spot market bid of 50% of the On-Demand price.

Update the ingestion process to use a fleet of EC2 Reserved Instances behind a Network Load Balancer with 3-year leases. Use Batch with Spot instances with a maximum bid of 50% of the On-Demand price for the nightly processing.

Update the ingestion process to use Amazon Kinesis Data Firehose to save data to Amazon Redshift. Use an AWS Lambda function scheduled to run nightly with Amazon CloudWatch Events to query Amazon Redshift to generate the daily statistics.

Explanations:

While using Amazon Kinesis Data Firehose to save data to Amazon S3 is a good choice for streaming data, performing nightly processing on On-Demand EC2 instances does not take advantage of potential cost savings from using Spot instances or Batch processing. Additionally, this option does not address the scalability and cost efficiency of the nightly processing, making it less optimal.

This option leverages Kinesis Data Firehose to save data to S3, which is cost-effective for ingestion. By using AWS Batch with Spot instances at a bid of 50% of the On-Demand price, the company can significantly reduce costs for nightly processing while also ensuring scalability and flexibility, making it the most cost-effective design.

While this option uses Kinesis Data Firehose and Batch processing, committing to a fleet of EC2 Reserved Instances for ingestion goes against the goal of cost-effectiveness. Reserved Instances have a fixed cost and may not provide the necessary flexibility, especially when the statistical analysis is not mission-critical. Spot instances should be used for nightly processing to optimize costs.

Although using Kinesis Data Firehose to save data to Amazon Redshift is a valid approach, it introduces unnecessary complexity and cost for querying the data using AWS Lambda. The Lambda function incurs additional execution costs, and this architecture may not efficiently handle the nightly batch processing requirements, making it less suitable compared to using AWS Batch.

Learn & move to cloud

Which is the most cost-effective design?

Explanations:

Leave a Reply Cancel reply