Which solution will meet these requirements MOST cost-effectively?
Create an Amazon Aurora MySQL database. Migrate the data from the S3 bucket into Aurora by using AWS Database Migration Service (AWS DMS). Issue SQL statements to the Aurora database.
Create an Amazon Redshift cluster. Use Redshift Spectrum to run SQL statements directly on the data in the S3 bucket.
Create an AWS Glue crawler to store and retrieve table metadata from the S3 bucket. Use Amazon Athena to run SQL statements directly on the data in the S3 bucket.
Create an Amazon EMR cluster. Use Apache Spark SQL to run SQL statements directly on the data in the S3 bucket.
Explanations:
Aurora MySQL is a relational database and is not optimized for querying large, unstructured data stored in S3. Migrating the data to Aurora using AWS DMS would introduce unnecessary complexity and costs, as it would require transferring and storing the data in Aurora.
Redshift Spectrum allows querying data directly from S3, but it is designed for large-scale data warehouses, and using it for 10 TB of log files could incur higher costs and complexity compared to a simpler solution like Athena.
AWS Glue can automatically catalog the data, and Amazon Athena can then query the data directly in S3 using SQL without needing to move or transform the data. This is the most cost-effective solution, as Athena charges based on the amount of data scanned, and there are no infrastructure management costs.
While Amazon EMR can run SQL queries using Apache Spark, it requires provisioning and managing an EMR cluster, which introduces additional complexity and cost. Athena offers a more cost-effective, serverless solution for querying data directly in S3.