What should the engineer do to improve the validation accuracy of the model?
Perform stratified sampling on the original dataset.
Acquire additional data about the majority classes in the original dataset.
Use a smaller, randomly sampled version of the training dataset.
Perform systematic sampling on the original dataset.
Explanations:
Stratified sampling ensures that each class is proportionally represented in both the training and validation datasets, addressing the issue of imbalanced data. This allows the model to learn from a more representative set of data, improving generalization.
Acquiring more data for the majority class does not directly address the model’s inability to generalize. Imbalanced classes can still lead to poor performance on minority classes, even with more majority class data.
Using a smaller, randomly sampled version of the training dataset reduces the amount of information the model has for training. This does not address the issue of imbalance and may further reduce the model’s ability to generalize.
Systematic sampling does not specifically address the class imbalance problem. It may introduce bias or fail to maintain proportional representation of classes, leading to poor generalization.