Which option will meet the company's requirements?

A company is running a commercial Apache Hadoop cluster on Amazon EC2.This cluster is being used daily to query large files on Amazon S3.The data onAmazon S3 has been curated and does not require any additional transformations steps.The company is using a commercial business intelligence (BI) tool onAmazon EC2 to run queries against the Hadoop cluster and visualize the data.The company wants to reduce or eliminate the overhead costs associated with managing the Hadoop cluster and the BI tool.The company would like to move to a more cost-effective solution with minimal effort.The visualization is simple and requires performing some basic aggregation steps only.

Which option will meet the company’s requirements?

Launch a transient Amazon EMR cluster daily and develop an Apache Hive script to analyze the files on Amazon S3. Shut down the Amazon EMR cluster when the job is complete. Then use Amazon QuickSight to connect to Amazon EMR and perform the visualization.

Develop a stored procedure invoked from a MySQL database running on Amazon EC2 to analyze the files in Amazon S3. Then use a fast in-memory BI tool running on Amazon EC2 to visualize the data.

Develop a script that uses Amazon Athena to query and analyze the files on Amazon S3. Then use Amazon QuickSight to connect to Athena and perform the visualization.

Use a commercial extract, transform, load (ETL) tool that runs on Amazon EC2 to prepare the data for processing. Then switch to a faster and cheaper BI tool that runs on Amazon EC2 to visualize the data from Amazon S3.

Explanations:

Launching a transient Amazon EMR cluster daily involves management overhead, as it requires script development and operational tasks for starting and stopping the cluster. While EMR can be cost-effective, it does not eliminate the cluster management overhead.

Using a MySQL database on Amazon EC2 introduces additional management complexity and cost for maintaining the database instance. This option also requires a separate BI tool, which may not be as cost-effective as using serverless solutions.

Amazon Athena allows for serverless querying of data in S3, eliminating the need for cluster management. It is cost-effective as you pay only for the queries you run, and it integrates seamlessly with Amazon QuickSight for visualization, fulfilling the company’s requirements with minimal effort.

This option introduces complexity by requiring an ETL tool to prepare the data and additional management overhead for running a new BI tool on EC2. It does not provide a cost-effective or simplified solution as it adds extra processing steps and infrastructure management.

Learn & move to cloud

Which option will meet the company’s requirements?

Explanations:

Leave a Reply Cancel reply