Which architecture changes would ensure that provisioned resources are being utilized effectively?

A machine learning specialist is running an Amazon SageMaker endpoint using the built-in object detection algorithm on a P3 instance for real-time predictions in a company’s production application.When evaluating the model’s resource utilization, the specialist notices that the model is using only a fraction of the GPU.

Which architecture changes would ensure that provisioned resources are being utilized effectively?

Redeploy the model as a batch transform job on an M5 instance.

Redeploy the model on an M5 instance. Attach Amazon Elastic Inference to the instance.

Redeploy the model on a P3dn instance.

Deploy the model onto an Amazon Elastic Container Service (Amazon ECS) cluster using a P3 instance.

Explanations:

Batch transform jobs are suitable for asynchronous processing and not for real-time predictions, which is the current application need.

M5 instances are optimized for general-purpose processing and attaching Amazon Elastic Inference can accelerate GPU tasks cost-effectively, ensuring better resource utilization without overprovisioning.

P3dn instances provide enhanced GPU capabilities, but they may lead to over-provisioning as the current workload does not fully utilize the GPU, leading to unnecessary costs.

Deploying to Amazon ECS with a P3 instance does not address the underutilization issue; it may improve scaling but does not optimize resource usage for lower GPU demand.

Learn & move to cloud

Which architecture changes would ensure that provisioned resources are being utilized effectively?

Explanations:

Leave a Reply Cancel reply