Which solution will meet these requirements with the LEAST development effort?
Create an Amazon EMR cluster with Apache Spark installed. Write a Spark application to transform the data. Use EMR File System (EMRFS) to write files to the transformed data bucket.
Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.
Use AWS Batch to create a job definition with Bash syntax to transform the data and output the data to the transformed data bucket. Use the job definition to submit a job. Specify an array job as the job type.
Create an AWS Lambda function to transform the data and output the data to the transformed data bucket. Configure an event notification for the S3 bucket. Specify the Lambda function as the destination for the event notification.
Explanations:
While an Amazon EMR cluster can transform the data, it requires more setup and management than necessary for this task. It involves creating and configuring the cluster, which adds complexity.
AWS Glue provides a managed ETL service that simplifies the process of discovering, transforming, and loading data. Using a Glue crawler and an ETL job requires the least development effort as it is designed for this type of task.
AWS Batch can handle large-scale jobs, but creating a job definition and writing the transformation in Bash adds significant development complexity and is not as efficient as using Glue for this specific use case.
While AWS Lambda can be used for this transformation, it may not handle large files efficiently and requires additional configuration for event notifications and managing Lambda limits, making it less optimal compared to AWS Glue.