Which solution will meet these requirements with the LEAST amount of compute resources?

By: study aws cloud

On: January 10, 2025

Tagged: Machine Learning Specialty

With: 0 Comments

A data scientist needs to create a model for predictive maintenance.The model will be based on historical data to identify rare anomalies in the data.The historical data is stored in an Amazon S3 bucket.The data scientist needs to use Amazon SageMaker Data Wrangler to ingest the data.The data scientist also needs to perform exploratory data analysis (EDA) to understand the statistical properties of the data.

Which solution will meet these requirements with the LEAST amount of compute resources?

Import the data by using the None option.

Import the data by using the Stratified option.

Import the data by using the First K option. Infer the value of K from domain knowledge.

Import the data by using the Randomized option. Infer the random size from domain knowledge.

Explanations:

The None option does not import any data, which would not allow for any analysis or modeling to be performed. This option does not meet the requirements for ingesting data for predictive maintenance.

The Stratified option is typically used for ensuring that the sample maintains the same proportion of classes as the whole dataset. While this could be useful for certain types of analyses, it may involve additional computation to maintain these proportions, making it less efficient for the stated requirement of using the least amount of compute resources.

The First K option allows the data scientist to import a specified number of records based on domain knowledge. This is efficient as it reduces the amount of data processed while still allowing for sufficient exploratory data analysis (EDA) to understand the statistical properties without overloading the system.

The Randomized option involves randomly selecting a subset of the data, which may require additional compute resources for sampling and could introduce variability that complicates EDA. This approach does not guarantee that the most relevant data for anomaly detection is included, making it less efficient than the First K option.

Previous Post: What should a solutions architect implement to overcome these timeout errors?

Next Post: How can a SysOps administrator achieve this is with the LEAST amount of operational overhead?

Leave a Reply Cancel reply