Which feature engineering strategy should the ML specialist use with Amazon SageMaker?
Apply dimensionality reduction by using the principal component analysis (PCA) algorithm.
Drop the features with low correlation scores by using a Jupyter notebook.
Apply anomaly detection by using the Random Cut Forest (RCF) algorithm.
Concatenate the features with high correlation scores by using a Jupyter notebook.
Explanations:
Principal Component Analysis (PCA) is a dimensionality reduction technique that can be useful in reducing highly correlated features. It helps to decrease the complexity of the dataset by projecting the data into a lower-dimensional space while preserving most of the variance. This is appropriate given the high correlation between many feature pairs.
Dropping features with low correlation scores does not address the issue of high correlation among features. It is more beneficial to focus on reducing the redundancy created by highly correlated features, not removing non-correlated ones.
Anomaly detection with Random Cut Forest (RCF) is not the best approach for handling correlation issues in feature engineering. RCF is typically used to detect outliers and anomalies in data, not to address multicollinearity or high feature redundancy.
Concatenating features with high correlation scores would not be effective. While combining correlated features may reduce the number of features, it does not eliminate redundancy or solve the underlying problem of multicollinearity. This could lead to overfitting.