Which solution meets these requirements?
Use regularly scheduled AWS Snowball Edge devices to transfer the sequencing data into AWS. When AWS receives the Snowball Edge device and the data is loaded into Amazon S3, use S3 events to trigger an AWS Lambda function to process the data.
Use AWS Data Pipeline to transfer the sequencing data to Amazon S3. Use S3 events to trigger an Amazon EC2 Auto Scaling group to launch custom-AMI EC2 instances running the Docker containers to process the data.
Use AWS DataSync to transfer the sequencing data to Amazon S3. Use S3 events to trigger an AWS Lambda function that starts an AWS Step Functions workflow. Store the Docker images in Amazon Elastic Container Registry (Amazon ECR) and trigger AWS Batch to run the container and process the sequencing data.
Use an AWS Storage Gateway file gateway to transfer the sequencing data to Amazon S3. Use S3 events to trigger an AWS Batch job that executes on Amazon EC2 instances running the Docker containers to process the data.
Explanations:
AWS Snowball Edge is designed for large data transfer, but it requires physical shipment of the device, which may not be efficient for daily data transfers (10-15 jobs). Additionally, triggering processing via AWS Lambda might not scale effectively for genomics data analysis workloads that require substantial compute resources.
While AWS Data Pipeline can transfer data, it is not the best fit for high-speed data transfer to Amazon S3, especially for large genomic datasets. The use of S3 events to trigger an EC2 Auto Scaling group can be complex and may not efficiently manage the workload demands or reduce turnaround time effectively.
AWS DataSync is optimized for transferring large amounts of data to Amazon S3, making it suitable for the 200 GB per genome data transfer. The combination of S3 events triggering AWS Step Functions and AWS Batch allows for efficient orchestration of workflows and scaling to handle workload demands effectively, while using Amazon ECR for Docker images ensures smooth deployment.
AWS Storage Gateway is primarily used for hybrid cloud storage solutions and may introduce unnecessary complexity in transferring sequencing data to Amazon S3. Triggering AWS Batch jobs directly from S3 events is feasible, but using a file gateway may not provide the efficiency needed for handling large volumes of genomic data efficiently compared to the direct data transfer options.