Which solution will enable the company to achieve its goal with the LEAST operational overhead?
Create an Amazon SageMaker notebook instance for pulling all the models from Amazon S3 using the boto3 library. Remove the existing instances and use the notebook to perform a SageMaker batch transform for performing inferences offline for all the possible users in all the cities. Store the results in different files in Amazon S3. Point the web client to the files.
Prepare an Amazon SageMaker Docker container based on the open-source multi-model server. Remove the existing instances and create a multi-model endpoint in SageMaker instead, pointing to the S3 bucket containing all the models. Invoke the endpoint from the web client at runtime, specifying the TargetModel parameter according to the city of each request.
Keep only a single EC2 instance for hosting all the models. Install a model server in the instance and load each model by pulling it from Amazon S3. Integrate the instance with the web client using Amazon API Gateway for responding to the requests in real time, specifying the target resource according to the city of each request.
Prepare a Docker container based on the prebuilt images in Amazon SageMaker. Replace the existing instances with separate SageMaker endpoints, one for each city where the company operates. Invoke the endpoints from the web client, specifying the URL and EndpointName parameter according to the city of each request.
Explanations:
This option involves using a SageMaker notebook for batch processing and storing results in S3, which requires manual management of output files and does not provide real-time inference. It increases operational overhead due to the need for batch processing rather than responding to requests on-demand.
Using a multi-model endpoint in SageMaker allows the company to load multiple models dynamically from S3 with minimal operational overhead. This solution efficiently utilizes resources and provides real-time inference while managing multiple models without the need for separate EC2 instances for each city.
While this option proposes consolidating models onto a single EC2 instance, it still requires significant management of the model loading and serving processes. Additionally, a single instance could become a bottleneck under load, leading to increased latency and reduced performance.
Creating separate SageMaker endpoints for each city would lead to unnecessary resource allocation and management complexity. Each endpoint would incur costs and operational overhead without leveraging the benefits of a multi-model architecture, which is more efficient and scalable.