Which type of pretraining bias did the ML specialist observe in the training dataset?
Difference in proportions of labels (DPL)
Class imbalance (CI)
Conditional demographic disparity (CDD)
Kolmogorov-Smirnov (KS)
Explanations:
Difference in Proportions of Labels (DPL) refers to discrepancies in the distribution of labels across different groups but does not specifically address the age group’s representation.
Class Imbalance (CI) refers to situations where certain classes (or categories) are underrepresented in the dataset. In this case, the 40 to 55 year-old age group has fewer examples, indicating class imbalance.
Conditional Demographic Disparity (CDD) is a measure of whether the distribution of predictions varies by sensitive attributes, which is not the primary concern in this scenario about representation of an age group.
Kolmogorov-Smirnov (KS) is a statistical test used to compare two distributions, typically to assess if they differ significantly. It does not directly address the bias related to age group representation.