Which solution will meet these requirements with the LEAST operational overhead?

A real-estate company is launching a new product that predicts the prices of new houses.The historical data for the properties and prices is stored in .csv format in an Amazon S3 bucket.The data has a header, some categorical fields, and some missing values.The company’s data scientists have used Python with a common open-source library to fill the missing values with zeros.The data scientists have dropped all of the categorical fields and have trained a model by using the open-source linear regression algorithm with the default parameters.The accuracy of the predictions with the current model is below 50%.The company wants to improve the model performance and launch the new product as soon as possible.

Which solution will meet these requirements with the LEAST operational overhead?

Create a service-linked role for Amazon Elastic Container Service (Amazon ECS) with access to the S3 bucket. Create an ECS cluster that is based on an AWS Deep Learning Containers image. Write the code to perform the feature engineering. Train a logistic regression model for predicting the price, pointing to the bucket with the dataset. Wait for the training job to complete. Perform the inferences.

Create an Amazon SageMaker notebook with a new IAM role that is associated with the notebook. Pull the dataset from the S3 bucket. Explore different combinations of feature engineering transformations, regression algorithms, and hyperparameters. Compare all the results in the notebook, and deploy the most accurate configuration in an endpoint for predictions.

Create an IAM role with access to Amazon S3, Amazon SageMaker, and AWS Lambda. Create a training job with the SageMaker built-in XGBoost model pointing to the bucket with the dataset. Specify the price as the target feature. Wait for the job to complete. Load the model artifact to a Lambda function for inference on prices of new houses.

Create an IAM role for Amazon SageMaker with access to the S3 bucket. Create a SageMaker AutoML job with SageMaker Autopilot pointing to the bucket with the dataset. Specify the price as the target attribute. Wait for the job to complete. Deploy the best model for predictions.

Explanations:

This solution involves using Amazon ECS and AWS Deep Learning Containers, which requires managing infrastructure and writing custom code for feature engineering, model training, and inference. This adds operational overhead and complexity, which doesn’t meet the requirement of least overhead.

Although SageMaker notebooks are useful for exploring data and building models, this option requires manual exploration of feature engineering, transformations, and hyperparameters. It is not the most efficient or automated approach, and it doesn’t directly address the need for a fast, low-overhead solution.

Using a Lambda function for inference with a manually trained XGBoost model adds operational overhead. Lambda may not be ideal for handling the necessary computational power for machine learning model inference, and manual feature engineering and training are needed, which increases complexity.

SageMaker AutoML with SageMaker Autopilot automates the entire process, from feature engineering to model selection and hyperparameter tuning. It is a fully managed service that minimizes operational overhead, meets the need for better accuracy, and allows for quick deployment of the best model with minimal intervention.

Learn & move to cloud

Which solution will meet these requirements with the LEAST operational overhead?

Explanations:

Leave a Reply Cancel reply