Which combination of steps will ensure optimal performance as the data grows?

A company wants to analyze log data using date ranges with a custom application running on AWS.The application generates about 10 GB of data every day, which is expected to grow.A Solutions Architect is tasked with storing the data in Amazon S3 and using Amazon Athena to analyze the data.

Which combination of steps will ensure optimal performance as the data grows?

(Choose two.)

Store each object in Amazon S3 with a random string at the front of each key.

Store the data in multiple S3 buckets.

Store the data in Amazon S3 in a columnar format, such as Apache Parquet or Apache ORC.

Store the data in Amazon S3 in objects that are smaller than 10 MB.

Store the data using Apache Hive partitioning in Amazon S3 using a key that includes a date, such as dt=2019-02.

Explanations:

Storing objects with a random string at the front of each key does not optimize performance for Athena queries. It can lead to inefficient object retrieval patterns and does not enhance data processing speeds in Athena.

Using multiple S3 buckets does not improve performance for Athena analysis. Instead, keeping data organized within a single bucket with proper structure and partitioning is more efficient for querying purposes.

Storing data in a columnar format like Apache Parquet or Apache ORC optimizes performance for Athena, as these formats are designed for efficient data retrieval and compression, reducing the amount of data scanned during queries.

Storing objects smaller than 10 MB can lead to inefficiencies. For optimal performance, larger objects (typically between 128 MB and 1 GB) are preferable because they reduce the overhead of managing many small files.

Using Apache Hive partitioning based on date keys (e.g., dt=2019-02) in S3 allows Athena to quickly filter and scan relevant data segments. This significantly improves query performance and reduces costs by limiting the amount of data scanned.

Learn & move to cloud

Which combination of steps will ensure optimal performance as the data grows?

Explanations:

Leave a Reply Cancel reply