How can the ML specialist meet these requirements with the LEAST operational overhead?
Load the data into an Amazon SageMaker Studio notebook. Calculate the first and third quartile. Use a SageMaker Data Wrangler data flow to remove only values that are outside of those quartiles.
Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset. Use a Data Wrangler data flow to remove outliers based on the bias report.
Use an Amazon SageMaker Data Wrangler anomaly detection visualization to find outliers in the dataset. Add a transformation to a Data Wrangler data flow to remove outliers.
Use Amazon Lookout for Equipment to find and remove outliers from the dataset.
Explanations:
While using a SageMaker Studio notebook and calculating the first and third quartile can identify outliers based on the interquartile range, this approach involves manual steps and does not leverage the automated capabilities of SageMaker Data Wrangler fully, resulting in higher operational overhead.
Amazon SageMaker Data Wrangler’s bias report is primarily designed for identifying bias in datasets rather than specifically finding outliers. Thus, relying on it for outlier detection would not be appropriate or efficient for the task at hand.
Amazon SageMaker Data Wrangler provides anomaly detection visualizations that can effectively identify outliers in the dataset. Adding a transformation to remove these outliers as part of a Data Wrangler data flow allows for a streamlined and automated process, minimizing operational overhead.
Amazon Lookout for Equipment is designed for anomaly detection and monitoring but is more focused on operational metrics and alerting rather than directly processing and removing outliers from a dataset. It does not fit the requirement of preparing a dataset for ML model training efficiently.