Which solution meets these requirements?
The requests from the API are sent to an Application Load Balancer (ALB). Models are deployed as AWS Lambda functions invoked by the ALB.
The requests from the API are sent to the models Amazon Simple Queue Service (Amazon SQS) queue. Models are deployed as AWS Lambda functions triggered by SQS events AWS Auto Scaling is enabled on Lambda to increase the number of vCPUs based on the SQS queue size.
The requests from the API are sent to the model’s Amazon Simple Queue Service (Amazon SQS) queue. Models are deployed as Amazon Elastic Container Service (Amazon ECS) services reading from the queue AWS App Mesh scales the instances of the ECS cluster based on the SQS queue size.
The requests from the API are sent to the models Amazon Simple Queue Service (Amazon SQS) queue. Models are deployed as Amazon Elastic Container Service (Amazon ECS) services reading from the queue AWS Auto Scaling is enabled on Amazon ECS for both the cluster and copies of the service based on the queue size.
Explanations:
AWS Lambda functions are not suitable for loading 1 GB of model data into memory at startup due to the 15-minute execution timeout and cold start issues, especially with irregular usage patterns. Additionally, Lambda is not designed for long-lived processes that maintain in-memory data.
While using Amazon SQS can help with asynchronous processing, AWS Lambda functions may still face limitations with the 1 GB model data loading requirement and may not be able to handle large batch processing efficiently due to execution time constraints and cold starts.
Although deploying models as Amazon ECS services can handle larger memory and processing needs better than Lambda, using AWS App Mesh for scaling based on SQS queue size isn’t optimal for managing irregular usage patterns. App Mesh is more suited for service-to-service communication rather than scaling based on workload.
This option effectively utilizes Amazon ECS for deploying models, allowing for significant memory and resource allocation necessary for loading large model data. Implementing AWS Auto Scaling based on SQS queue size enables dynamic scaling to handle varying request loads efficiently, accommodating both low and high usage patterns effectively.