Which approach would improve the availability and durability of the system while decreasing the processing latency and minimizing costs?

A Solutions Architect is responsible for redesigning a legacy Java application to improve its availability, data durability, and scalability. Currently, the application runs on a single high-memory Amazon EC2 instance. It accepts HTTP requests from upstream clients, adds them to an in-memory queue, and responds with a200 status. A separate application thread reads items from the queue, processes them, and persists the results to an Amazon RDS MySQL instance. The processing time for each item takes 90 seconds on average, most of which is spent waiting on external service calls, but the application is written to process multiple items in parallel.Traffic to this service is unpredictable. During periods of high load, items may sit in the internal queue for over an hour while the application processes the backlog.In addition, the current system has issues with availability and data loss if the single application node fails.Clients that access this service cannot be modified. They expect to receive a response to each HTTP request they send within 10 seconds before they will time out and retry the request.

Which approach would improve the availability and durability of the system while decreasing the processing latency and minimizing costs?

Create an Amazon API Gateway REST API that uses Lambda proxy integration to pass requests to an AWS Lambda function. Migrate the core processing code to a Lambda function and write a wrapper class that provides a handler method that converts the proxy events to the internal application data model and invokes the processing module.

Create an Amazon API Gateway REST API that uses a service proxy to put items in an Amazon SQS queue. Extract the core processing code from the existing application and update it to pull items from Amazon SQS instead of an in-memory queue. Deploy the new processing application to smaller EC2 instances within an Auto Scaling group that scales dynamically based on the approximate number of messages in the Amazon SQS queue.

Modify the application to use Amazon DynamoDB instead of Amazon RDS. Configure Auto Scaling for the DynamoDB table. Deploy the application within an Auto Scaling group with a scaling policy based on CPU utilization. Back the in-memory queue with a memory-mapped file to an instance store volume and periodically write that file to Amazon S3.

Update the application to use a Redis task queue instead of the in-memory queue. Build a Docker container image for the application. Create an Amazon ECS task definition that includes the application container and a separate container to host Redis. Deploy the new task definition as an ECS service using AWS Fargate, and enable Auto Scaling.

Explanations:

While using AWS Lambda could improve scalability and reduce management overhead, it may not meet the processing latency requirements due to cold start issues and is not well-suited for long-running tasks like the 90-second processing time described. Additionally, it does not address the need for an internal queue and data durability.

This option effectively decouples the HTTP request handling from the processing logic by utilizing Amazon SQS as a durable message queue. It allows for horizontal scaling through an Auto Scaling group for EC2 instances, addressing both availability and processing latency issues. Clients will receive a quick response while processing continues asynchronously in the background.

Switching to DynamoDB may improve scalability but does not directly address the need for a durable queue for processing items, and could still result in high latency due to the 90-second processing time. Additionally, backing the in-memory queue with a memory-mapped file is not a reliable solution for ensuring data durability and could introduce complexity without addressing the core issues.

Although using Redis for task queuing could improve performance, it introduces additional complexity with managing Redis as a separate service and does not provide a durable queue for failed or retried requests. ECS and Fargate offer scaling benefits but may still face issues with long processing times and does not provide the robust handling of unpredictable traffic compared to SQS.

Learn & move to cloud

Which approach would improve the availability and durability of the system while decreasing the processing latency and minimizing costs?

Explanations:

Leave a Reply Cancel reply