How should the company prepare the data for the model to improve the model’s accuracy?
Adjust the class weight to account for each machine type.
Oversample the failure cases by using the Synthetic Minority Oversampling Technique (SMOTE).
Undersample the non-failure events. Stratify the non-failure events by machine type.
Undersample the non-failure events by using the Synthetic Minority Oversampling Technique (SMOTE).
Explanations:
Adjusting class weight to account for machine type is not the most effective approach. The model would still face an imbalance problem with the failures themselves, as there are only 100 failure cases. Focusing solely on machine type doesn’t address the core issue of predicting failure events.
Oversampling failure cases using SMOTE is appropriate for handling class imbalance. SMOTE generates synthetic examples of the minority class (failures) to balance the dataset, helping the model better learn to predict failure events despite their rarity.
Undersampling non-failure events could reduce the amount of data available for training, which is not ideal. Stratifying by machine type would not address the core imbalance problem, as it still limits the data of the minority class (failure cases).
While SMOTE is a technique for oversampling, it is typically used for the minority class (failure cases) rather than undersampling. The correct approach is oversampling the minority class, not undersampling the majority class.
In my opinion, the answer is:
Oversample the failure cases by using the Synthetic Minority Oversampling Technique (SMOTE).