Which architecture should be used to scale the solution at the lowest cost?

A technology startup is using complex deep neural networks and GPU compute to recommend the company’s products to its existing customers based upon each customer’s habits and interactions.The solution currently pulls each dataset from an Amazon S3 bucket before loading the data into a TensorFlow model pulled from the company’s Git repository that runs locally.This job then runs for several hours while continually outputting its progress to the same S3 bucket.The job can be paused, restarted, and continued at any time in the event of a failure, and is run from a central queue.Senior managers are concerned about the complexity of the solution’s resource management and the costs involved in repeating the process regularly.They ask for the workload to be automated so it runs once a week, starting Monday and completing by the close of business Friday.

Which architecture should be used to scale the solution at the lowest cost?

Implement the solution using AWS Deep Learning Containers and run the container as a job using AWS Batch on a GPU-compatible Spot Instance

Implement the solution using a low-cost GPU-compatible Amazon EC2 instance and use the AWS Instance Scheduler to schedule the task

Implement the solution using AWS Deep Learning Containers, run the workload using AWS Fargate running on Spot Instances, and then schedule the task using the built-in task scheduler

Implement the solution using Amazon ECS running on Spot Instances and schedule the task using the ECS service scheduler

Explanations:

AWS Deep Learning Containers provide a managed environment for deep learning workloads. Running the container as a job using AWS Batch on a GPU-compatible Spot Instance optimizes costs by utilizing Spot Instances, which are significantly cheaper than On-Demand instances. This option also allows for automatic scaling and job management, aligning with the requirements for a weekly automated process.

While a low-cost GPU-compatible EC2 instance can run the workload, using the AWS Instance Scheduler may not be the most efficient approach. This option lacks the automation and resource management features of Batch or Fargate, potentially leading to higher operational costs and complexity in resource management.

Although AWS Deep Learning Containers and AWS Fargate can run the workload effectively, using Fargate does not support GPU instances. This makes it unsuitable for deep learning tasks that require GPU resources, which are essential for performance. Therefore, this option cannot meet the workload requirements.

Amazon ECS can run on Spot Instances, which is cost-effective. However, ECS lacks the optimized scheduling and resource management features provided by AWS Batch for deep learning workloads. Additionally, without built-in support for GPU resources as Batch offers, it may not efficiently handle the required GPU compute needs for the task.

Learn & move to cloud

Which architecture should be used to scale the solution at the lowest cost?

Explanations:

Leave a Reply Cancel reply