What should a solutions architect recommend to meet these requirements?

A company has an application that scans millions of connected devices for security threats and pushes the scan logs to an Amazon S3 bucket.A total of 70 GB of data is generated each week, and the company needs to store 3 years of data for historical reporting.The company must process, aggregate, and enrich the data from Amazon S3 by performing complex analytical queries and joins in the least amount of time.The aggregated dataset is visualized on an Amazon QuickSight dashboard.

What should a solutions architect recommend to meet these requirements?

Create and run an ETL job in AWS Glue to process the data from Amazon S3 and load it into Amazon Redshift. Perform the aggregation queries on Amazon Redshift.

Use AWS Lambda functions based on S3 PutObject event triggers to copy the incremental changes to Amazon DynamoDB. Perform the aggregation queries on DynamoDB.

Use AWS Lambda functions based on S3 PutObject event triggers to copy the incremental changes to Amazon Aurora MySQL. Perform the aggregation queries on Aurora MySQL.

Use AWS Glue to catalog the data in Amazon S3. Perform the aggregation queries on the cataloged tables by using Amazon Athena. Query the data directly from Amazon S3.

Explanations:

Amazon Redshift is a fully managed data warehouse designed for high-performance complex queries, making it ideal for running aggregation queries and joins on large datasets. AWS Glue can be used for ETL processing to transform and load data from S3 to Redshift. Redshift is optimized for such analytical workloads, especially for historical data.

While DynamoDB is a scalable NoSQL database, it is not designed for performing complex analytical queries or joins. It is more suited for low-latency, real-time queries, not for large-scale aggregation and reporting. Using Lambda to copy data to DynamoDB would not be ideal for this use case.

Aurora MySQL is a relational database, but it is not optimized for large-scale data processing and complex analytical queries involving aggregations and joins across millions of records. While Aurora can handle transactional workloads, Redshift would be a more suitable choice for large-scale analytical processing.

Athena is a serverless query service that allows direct SQL queries on S3 data, but it is less optimized for complex aggregation and join queries compared to Redshift. Athena is ideal for simple, ad-hoc querying, but for large-scale processing and complex analytics, Redshift is typically a better solution.

Learn & move to cloud

What should a solutions architect recommend to meet these requirements?

Explanations:

Leave a Reply Cancel reply