Which solution will meet the company’s requirements?
Create a container in the Amazon Elastic Container Registry with the executable file for the job. Use Amazon ECS with Spot Fleet in Auto Scaling groups. Store the raw data in Amazon EBS SC1 volumes and write the output to Amazon S3.
Create an Amazon EMR cluster with a combination of On Demand and Reserved Instance Task Nodes that will use Spark to pull data from Amazon S3. Use Amazon DynamoDB to maintain a list of jobs that need to be processed by the Amazon EMR cluster.
Store the raw data in Amazon S3, and use AWS Batch with Managed Compute Environments to create Spot Fleets. Submit jobs to AWS Batch Job Queues to pull down objects from Amazon S3 onto Amazon EBS volumes for temporary storage to be processed, and then write the results back to Amazon S3.
Submit the list of jobs to be processed to an Amazon SQS to queue the jobs that need to be processed. Create a diversified cluster of Amazon EC2 worker instances using Spot Fleet that will automatically scale based on the queue depth. Use Amazon EFS to store all the data sharing it across all instances in the cluster.
Explanations:
Using Amazon ECS with Spot Fleet and storing data on EBS SC1 volumes is not cost-effective for processing large amounts of data. EBS volumes are also less suited for high throughput required for processing petabytes of images. Additionally, EBS requires more management overhead compared to other services.
While Amazon EMR is a good choice for processing large datasets, using a combination of On Demand and Reserved Instance Task Nodes may not be cost-effective for a fluctuating workload. Also, maintaining a list of jobs in DynamoDB adds management overhead and does not efficiently utilize S3 for data storage.
Storing raw data in Amazon S3 and using AWS Batch with Managed Compute Environments is a scalable and cost-effective solution. AWS Batch handles job scheduling and scaling automatically, and using Spot Fleets helps reduce costs while processing data efficiently from S3. Temporary storage on EBS is appropriate for job processing.
Using Amazon SQS to queue jobs and scaling EC2 instances with Spot Fleet can be complex to manage, especially for a large number of jobs. While Amazon EFS can share data across instances, it may not provide the same performance and cost efficiency as S3 for large datasets, and managing EC2 instances can lead to increased operational overhead.