Which solution should an ML specialist use to deliver the transcriptions to the S3 bucket as quickly as possible?

A company is using Amazon SageMaker to build a machine learning (ML) model to predict customer churn based on customer call transcripts.Audio files from customer calls are located in an on-premises VoIP system that has petabytes of recorded calls.The on-premises infrastructure has high-velocity networking and connects to the company’s AWS infrastructure through a VPN connection over a 100 Mbps connection.The company has an algorithm for transcribing customer calls that requires GPUs for inference.The company wants to store these transcriptions in an Amazon S3 bucket in the AWS Cloud for model development.

Which solution should an ML specialist use to deliver the transcriptions to the S3 bucket as quickly as possible?

Order and use an AWS Snowball Edge Compute Optimized device with an NVIDIA Tesla module to run the transcription algorithm. Use AWS DataSync to send the resulting transcriptions to the transcription S3 bucket.

Order and use an AWS Snowcone device with Amazon EC2 Inf1 instances to run the transcription algorithm. Use AWS DataSync to send the resulting transcriptions to the transcription S3 bucket.

Order and use AWS Outposts to run the transcription algorithm on GPU-based Amazon EC2 instances. Store the resulting transcriptions in the transcription S3 bucket.

Use AWS DataSync to ingest the audio files to Amazon S3. Create an AWS Lambda function to run the transcription algorithm on the audio files when they are uploaded to Amazon S3. Configure the function to write the resulting transcriptions to the transcription S3 bucket.

Explanations:

AWS Snowball Edge Compute Optimized device with an NVIDIA Tesla module can run the transcription algorithm. Using AWS DataSync to send transcriptions to S3 allows for high throughput and faster transfer. Snowball Edge is suitable for large data processing in edge environments, particularly with petabytes of data.

AWS Snowcone is smaller and less capable than Snowball Edge and doesn’t have the necessary compute power to run GPU-based transcription algorithms efficiently. Inf1 instances are designed for machine learning inference but are not ideal for running GPU-based algorithms for transcription.

AWS Outposts extend AWS infrastructure to on-premises but is an expensive solution for this use case. It would involve running an EC2 instance on-premises, making it slower and less efficient compared to using Snowball Edge for large-scale data processing and transfer.

AWS Lambda functions are not ideal for running GPU-based transcription algorithms due to resource limitations. Also, the Lambda function would not handle the required high-performance computing needed for transcription at scale. Using DataSync for audio file ingestion would work, but Lambda is unsuitable for the transcription task.

Learn & move to cloud

Which solution should an ML specialist use to deliver the transcriptions to the S3 bucket as quickly as possible?

Explanations:

Leave a Reply Cancel reply