Which solution will meet these requirements with the MOST operational efficiency?

A music streaming company is building a pipeline to extract features.The company wants to store the features for offline model training and online inference.The company wants to track feature history and to give the company’s data science teams access to the features.

Which solution will meet these requirements with the MOST operational efficiency?

Use Amazon SageMaker Feature Store to store features for model training and inference. Create an online store for online inference. Create an offline store for model training. Create an IAM role for data scientists to access and search through feature groups.

Use Amazon SageMaker Feature Store to store features for model training and inference. Create an online store for both online inference and model training. Create an IAM role for data scientists to access and search through feature groups.

Create one Amazon S3 bucket to store online inference features. Create a second S3 bucket to store offline model training features. Turn on versioning for the S3 buckets and use tags to specify which tags are for online inference features and which are for offline model training features. Use Amazon Athena to query the S3 bucket for online inference. Connect the S3 bucket for offline model training to a SageMaker training job. Create an IAM policy that allows data scientists to access both buckets.

Create two separate Amazon DynamoDB tables to store online inference features and offline model training features. Use time-based versioning on both tables. Query the DynamoDB table for online inference. Move the data from DynamoDB to Amazon S3 when a new SageMaker training job is launched. Create an IAM policy that allows data scientists to access both tables.

Explanations:

Amazon SageMaker Feature Store provides both online and offline storage solutions with built-in versioning. It allows efficient access and management of features for model training and inference. Creating separate stores for online inference and offline model training meets the requirements, and IAM roles can be used to control access for data scientists.

While using SageMaker Feature Store for both online inference and model training is valid, it is not the most efficient solution. Storing both in the online store would introduce inefficiencies for offline training, as the online store is optimized for real-time access, not batch processing.

Using Amazon S3 for both online and offline features requires custom management of versioning and tags, making it less efficient compared to a managed service like SageMaker Feature Store. Additionally, querying with Athena and connecting to SageMaker jobs adds complexity and reduces operational efficiency.

Storing features in separate DynamoDB tables for online and offline use introduces unnecessary complexity. DynamoDB is optimized for low-latency, real-time access, which is not suitable for batch model training. Moving data to S3 for training further complicates the solution. SageMaker Feature Store is a more appropriate service for these use cases.

Learn & move to cloud

Which solution will meet these requirements with the MOST operational efficiency?

Explanations:

Leave a Reply Cancel reply