Which algorithm will meet these requirements?

A data scientist at a food production company wants to use an Amazon SageMaker built-in model to classify different vegetables.The current dataset has many features.The company wants to save on memory costs when the data scientist trains and deploys the model.The company also wants to be able to find similar data points for each test data point.

Which algorithm will meet these requirements?

K-nearest neighbors (k-NN) with dimension reduction

Linear learner with early stopping

K-means

Principal component analysis (PCA) with the algorithm mode set to random

Explanations:

K-nearest neighbors (k-NN) is effective for classification tasks and can find similar data points for test data. Combining it with dimension reduction (e.g., PCA) helps reduce memory costs while retaining essential features, making it suitable for the company’s requirements.

The linear learner with early stopping may be efficient in training but does not inherently facilitate finding similar data points, which is a key requirement. Additionally, it may not sufficiently reduce memory usage if the dataset is large with many features.

K-means is a clustering algorithm, not a classification algorithm. While it groups data points, it does not classify them into predefined categories, nor does it directly support finding similar data points in a supervised learning context.

Principal component analysis (PCA) is a dimensionality reduction technique, not a classification algorithm. While it can help reduce memory costs, it does not classify or find similar data points on its own. The algorithm mode set to random does not affect its suitability for the classification task.

Learn & move to cloud

Which algorithm will meet these requirements?

Explanations:

Leave a Reply Cancel reply