Which solution will meet these requirements MOST cost-effectively?
Use Amazon Elastic Inference on the SageMaker hosted endpoint.
Retrain the CNN with more layers and a larger dataset.
Retrain the CNN with more layers and a smaller dataset.
Choose a SageMaker instance type that has multiple GPUs.
Explanations:
Amazon Elastic Inference allows for attaching low-cost inference accelerators to a SageMaker endpoint. This increases throughput and reduces latency without needing the more expensive GPU instances.
Retraining with more layers and a larger dataset may improve accuracy but will likely increase the computational requirements, thus increasing cost and potentially increasing latency.
Retraining with more layers and a smaller dataset is unlikely to achieve the desired improvements in throughput or latency and may degrade model performance.
Choosing a SageMaker instance with multiple GPUs is expensive and may not be the most cost-effective way to increase throughput and reduce latency. Elastic Inference is a more affordable alternative.