Which of the following options are the MOST effective in solving the issue while keeping costs to a minimum?
(Choose two.)
Configure the endpoint to use Amazon Elastic Inference (EI) accelerators.
Create a new endpoint configuration with two production variants.
Configure the endpoint to automatically scale with the InvocationsPerInstance metric.
Deploy a second instance pool to support a blue/green deployment of models.
Reconfigure the endpoint to use burstable instances.
Explanations:
Amazon Elastic Inference (EI) accelerators can help reduce response times and lower costs by providing additional GPU acceleration to TensorFlow models on SageMaker without needing full GPU instances. This optimizes performance at a lower cost than scaling with additional compute-optimized instances.
Creating a new endpoint configuration with two production variants could add complexity and would not directly address the issue of increasing response times and errors. This option is more suited for testing multiple models in production rather than managing load effectively.
Configuring automatic scaling based on the InvocationsPerInstance metric allows the recommendation engine to handle peak loads by scaling out during high-traffic times and scaling in when demand is lower, optimizing costs and maintaining performance.
Deploying a second instance pool for blue/green deployment does not address scaling issues and is generally used for updating models without downtime. This approach would increase costs without directly impacting response times or errors during peak load times.
Burstable instances are typically suited for occasional or short bursts of high CPU usage. They may not effectively handle sustained load and could lead to throttling, impacting response times and user experience during high traffic periods.