Which solution will meet these requirements?

A company is training machine learning (ML) models on Amazon SageMaker by using 200 TB of data that is stored in Amazon S3 buckets.The training data consists of individual files that are each larger than 200 MB in size.The company needs a data access solution that offers the shortest processing time and the least amount of setup.

Which solution will meet these requirements?

Use File mode in SageMaker to copy the dataset from the S3 buckets to the ML instance storage.

Create an Amazon FSx for Lustre file system. Link the file system to the S3 buckets.

Create an Amazon Elastic File System (Amazon EFS) file system. Mount the file system to the training instances.

Use FastFile mode in SageMaker to stream the files on demand from the S3 buckets.

Explanations:

File mode requires copying the dataset from S3 to the local storage of the training instance, which can take significant time due to the large dataset size (200 TB). This method is not efficient for processing large amounts of data and can lead to high setup and data transfer overhead.

Amazon FSx for Lustre can provide high-performance storage, but it requires complex setup and might not be the most efficient for streaming large datasets like this. Also, linking it to S3 does not automatically ensure the shortest processing time for ML workloads on SageMaker.

Amazon EFS provides a scalable file system, but it is not optimized for high-throughput data access required for ML training. It would result in slower processing times compared to other options that are more suitable for ML workloads.

FastFile mode in SageMaker streams the data directly from S3 on-demand, avoiding the need for copying large datasets to local storage. This minimizes setup and processing time, offering the most efficient data access solution for large datasets.

Learn & move to cloud

Which solution will meet these requirements?

Explanations:

Leave a Reply Cancel reply