Which solution will meet these requirements?

A solutions architect manages an analytics application.The application stores large amounts of semistructured data in an Amazon S3 bucket.The solutions architect wants to use parallel data processing to process the data more quickly.The solutions architect also wants to use information that is stored in an Amazon Redshift database to enrich the data.

Which solution will meet these requirements?

Use Amazon Athena to process the S3 data. Use AWS Glue with the Amazon Redshift data to enrich the S3 data.

Use Amazon EMR to process the S3 data. Use Amazon EMR with the Amazon Redshift data to enrich the S3 data.

Use Amazon EMR to process the S3 data. Use Amazon Kinesis Data Streams to move the S3 data into Amazon Redshift so that the data can be enriched.

Use AWS Glue to process the S3 data. Use AWS Lake Formation with the Amazon Redshift data to enrich the S3 data.

Explanations:

Amazon Athena is a serverless interactive query service, but it does not support direct enrichment of S3 data using Amazon Redshift data. While it can process S3 data, using AWS Glue to enrich data in Amazon Redshift is not directly possible in a streamlined manner without additional steps.

Amazon EMR (Elastic MapReduce) is a big data processing service that can process large amounts of data in parallel. It can easily access both the semi-structured data in S3 and the structured data in Amazon Redshift, allowing for efficient enrichment of S3 data. This option meets the requirements for parallel processing and data enrichment effectively.

While Amazon EMR can process data from S3, using Amazon Kinesis Data Streams to move S3 data into Amazon Redshift for enrichment introduces unnecessary complexity. Kinesis is typically used for real-time data streaming rather than batch processing, making this option less suitable for the use case described.

AWS Glue is primarily an ETL (extract, transform, load) service and is not optimized for processing large datasets in parallel like EMR. Although AWS Lake Formation can manage data lakes and integrate with Redshift, this option does not directly address the requirement for fast parallel processing of semi-structured data in S3.

Learn & move to cloud

Which solution will meet these requirements?

Explanations:

Leave a Reply Cancel reply