Which solution will meet these requirements?
Store data in Amazon S3. Use Amazon Redshift Spectrum to query data.
Store data in Amazon S3. Use the AWS Glue Data Catalog and Amazon Athena to query data.
Store data in EMR File System (EMRFS). Use Presto in Amazon EMR to query data.
Store data in Amazon Redshift. Use Amazon Redshift to query data.
Explanations:
While Amazon Redshift Spectrum allows querying data stored in S3, it incurs additional costs associated with Redshift and may not be as cost-effective as other options for a high volume of data.
Storing data in Amazon S3 and using AWS Glue Data Catalog along with Amazon Athena provides a serverless solution for querying data. It is cost-effective, as you only pay for the queries you run, making it suitable for the company’s query patterns and time frames.
Using EMR File System (EMRFS) on the persistent EMR cluster still incurs costs associated with running the EMR cluster, which may not be cost-effective for the company’s querying needs, especially if the cluster is always running.
Storing data in Amazon Redshift incurs higher costs compared to S3 and querying with Athena, as Redshift requires provisioning resources that may not be necessary for the data query patterns described.