Which solutions will meet these requirements?
Invoke an AWS Lambda function on file delivery that extracts each record and writes it to an Amazon SQS queue. Invoke another Lambda function when new messages arrive in the SQS queue to process the records, writing the results to a temporary location in Amazon S3. Invoke a final Lambda function once the SQS queue is empty to transform the records into JSON format and send the results to another S3 bucket for internal processing.
Invoke an AWS Lambda function on file delivery that extracts each record and writes it to an Amazon SQS queue. Configure an AWS Fargate container application to automatically scale to a single instance when the SQS queue contains messages. Have the application process each record, and transform the record into JSON format. When the queue is empty, send the results to another S3 bucket for internal processing and scale down the AWS Fargate instance.
Create an AWS Glue crawler and custom classifier based on the data feed formats and build a table definition to match. Invoke an AWS Lambda function on file delivery to start an AWS Glue ETL job to transform the entire record according to the processing and transformation requirements. Define the output format as JSON. Once complete, have the ETL job send the results to another S3 bucket for internal processing.
Create an AWS Glue crawler and custom classifier based upon the data feed formats and build a table definition to match. Perform an Amazon Athena query on file delivery to start an Amazon EMR ETL job to transform the entire record according to the processing and transformation requirements. Define the output format as JSON. Once complete, send the results to another S3 bucket for internal processing and scale down the EMR cluster.
Explanations:
While this approach uses Lambda functions and SQS to process records, invoking multiple Lambda functions for different steps introduces complexity and potential performance bottlenecks. Lambda has execution time limits, which may be exceeded depending on the number of records. This design does not scale efficiently for high-volume feeds.
AWS Fargate is over-engineered for this use case, especially for handling regular, high-volume small records. It introduces unnecessary complexity and would require managing containerized infrastructure, which adds overhead compared to serverless solutions. The solution is also not as cost-efficient for simple, record-based transformations.
AWS Glue with a custom classifier and ETL job is a well-suited solution for this use case. It allows for processing large volumes of data efficiently, transforming and masking records as needed. Glue scales automatically and integrates well with S3, making it ideal for handling future feeds and complex transformation tasks.
Using Athena and EMR for this task introduces unnecessary complexity and cost. Athena is meant for interactive querying, and EMR is typically used for more complex big data processing. For simple record transformations, this solution is over-engineered and not ideal in terms of scalability or cost-efficiency.