Which solution will meet these requirements MOST cost-effectively?
Create a serverless data pipeline. Use AWS Step Functions for orchestration. Use AWS Lambda functions with provisioned capacity to process the data.
Create an AWS Batch compute environment that includes Amazon EC2 Spot Instances. Specify the SPOT_CAPACITY_OPTIMIZED allocation strategy.
Create an AWS Batch compute environment that includes Amazon EC2 On-Demand Instances and Spot Instances. Specify the SPOT_CAPACITY_OPTIMIZED allocation strategy for the Spot Instances.
Use Amazon Elastic Kubernetes Service (Amazon EKS) to run the processing jobs. Use managed node groups that contain a combination of Amazon EC2 On-Demand Instances and Spot Instances.
Explanations:
While a serverless data pipeline using AWS Step Functions and Lambda may seem appealing, AWS Lambda has a maximum execution time of 15 minutes, which is not suitable for processing 15-20 GB of data in batch jobs. This option would also likely be more expensive due to the costs associated with high memory usage and longer running times in Lambda.
Using AWS Batch with EC2 Spot Instances is cost-effective for large batch-processing jobs, as Spot Instances can provide significant savings over On-Demand Instances. The SPOT_CAPACITY_OPTIMIZED allocation strategy helps ensure that jobs can run more reliably by prioritizing Spot capacity pools that have a lower likelihood of interruption, making it suitable for the company’s requirements.
Although this option provides flexibility by using both On-Demand and Spot Instances, the inclusion of On-Demand Instances adds unnecessary costs since the jobs can tolerate interruptions. The focus should be on minimizing costs, which makes reliance on Spot Instances more advantageous in this scenario.
While using Amazon EKS allows for flexible orchestration of jobs and can use Spot Instances, managing a Kubernetes cluster can introduce additional operational overhead and complexity, which is not cost-effective compared to AWS Batch for large batch-processing jobs. This option may not leverage the cost-saving potential of Spot Instances as effectively as AWS Batch.