Which solution will meet these requirements?

A company’s marketing data is uploaded from multiple sources to an Amazon S3 bucket.A series of data preparation jobs aggregate the data for reporting.The data preparation jobs need to run at regular intervals in parallel.A few jobs need to run in a specific order later.The company wants to remove the operational overhead of job error handling, retry logic, and state management.

Which solution will meet these requirements?

Use an AWS Lambda function to process the data as soon as the data is uploaded to the S3 bucket. Invoke other Lambda functions at regularly scheduled intervals.

Use Amazon Athena to process the data. Use Amazon EventBridge Scheduler to invoke Athena on a regular internal.

Use AWS Glue DataBrew to process the data. Use an AWS Step Functions state machine to run the DataBrew data preparation jobs.

Use AWS Data Pipeline to process the data. Schedule Data Pipeline to process the data once at midnight.

Explanations:

Using an AWS Lambda function to process data upon upload does not address the need for running jobs at regular intervals in parallel, as it would primarily react to individual uploads. Invoking other Lambda functions at scheduled intervals would still require manual management of state and error handling, which the company aims to eliminate.

Amazon Athena is primarily a query service for data stored in S3 and does not provide built-in mechanisms for handling job orchestration or error management. While EventBridge Scheduler can invoke Athena queries, it does not meet the requirements for managing the parallel execution of data preparation jobs or handling state.

AWS Glue DataBrew allows for data preparation and transformation, and using AWS Step Functions provides the capability to manage the workflow of these jobs, including executing them in parallel and in a specified order while handling retries and errors, effectively removing operational overhead.

AWS Data Pipeline does allow scheduling of data processing, but it is less flexible compared to other solutions and requires more management of dependencies and error handling. Additionally, running jobs only once at midnight does not fulfill the requirement for regular interval execution.

Learn & move to cloud

Which solution will meet these requirements?

Explanations:

Leave a Reply Cancel reply