Which process will improve the testing accuracy the MOST?

A global financial company is using machine learning to automate its loan approval process.The company has a dataset of customer information.The dataset contains some categorical fields, such as customer location by city and housing status.The dataset also includes financial fields in different units, such as account balances in US dollars and monthly interest in US cents.The company’s data scientists are using a gradient boosting regression model to infer the credit score for each customer.The model has a training accuracy of99% and a testing accuracy of 75%.The data scientists want to improve the model’s testing accuracy.

Which process will improve the testing accuracy the MOST?

Use a one-hot encoder for the categorical fields in the dataset. Perform standardization on the financial fields in the dataset. Apply L1 regularization to the data.

Use tokenization of the categorical fields in the dataset. Perform binning on the financial fields in the dataset. Remove the outliers in the data by using the z- score.

Use a label encoder for the categorical fields in the dataset. Perform L1 regularization on the financial fields in the dataset. Apply L2 regularization to the data.

Use a logarithm transformation on the categorical fields in the dataset. Perform binning on the financial fields in the dataset. Use imputation to populate missing values in the dataset.

Explanations:

One-hot encoding for categorical variables helps the model treat them as separate features, standardizing financial data makes it comparable across units, and L1 regularization helps in feature selection, preventing overfitting and improving model generalization on the test data.

Tokenization is typically used for text data and not appropriate for categorical variables. Binning might reduce the model’s ability to capture important patterns, and z-score outlier removal could be too simplistic for complex datasets.

Label encoding can result in ordinal relationships in categorical features, which is inappropriate for non-ordinal data. Regularization methods (L1 and L2) may be effective but do not directly address the issues of scaling or encoding.

Logarithmic transformations are not suitable for categorical fields. Binning can also lose useful information. Imputation of missing values is necessary, but it does not address the broader issue of overfitting and feature handling.

Learn & move to cloud

Which process will improve the testing accuracy the MOST?

Explanations:

Leave a Reply Cancel reply