Which data transformation step should the data scientist take to improve the predictions of the model?
One-hot encoding
Cartesian product transformation
Quantile binning
Normalization
Explanations:
One-hot encoding is used for categorical variables to convert them into a numerical format. It is not suitable for addressing non-linear relationships or skewness in numerical features like duration.
Cartesian product transformation is typically used to combine two or more datasets into a larger dataset. This does not address the non-linear relationship between duration and book sales.
Quantile binning can be used to transform the skewed numerical feature into bins based on quantiles, making the data more uniform and helping to capture non-linear patterns. This transformation can improve model predictions.
Normalization rescales numerical features to a specific range, typically [0, 1] or [-1, 1]. While it helps with scaling, it does not specifically address the non-linearity or skewness of the relationship between book sales and duration.