Which metrics should the data scientist use to optimize the classifier?

By: study aws cloud

On: January 8, 2025

Tagged: Machine Learning Specialty

With: 0 Comments

A company wants to detect credit card fraud.The company has observed that an average of 2% of credit card transactions are fraudulent.A data scientist trains a classifier on a year’s worth of credit card transaction data.The classifier needs to identify the fraudulent transactions.The company wants to accurately capture as many fraudulent transactions as possible.

Which metrics should the data scientist use to optimize the classifier?

(Choose two.)

Specificity

False positive rate

Accuracy

F1 score

True positive rate

Explanations:

Specificity measures the proportion of actual negatives that are correctly identified as such. In fraud detection, where the positive class (fraudulent transactions) is rare, optimizing for specificity may lead to missing many fraudulent transactions, which is not desirable.

False positive rate measures the proportion of actual negatives incorrectly identified as positives. In the context of fraud detection, minimizing false positives is important, but it does not help in capturing fraudulent transactions effectively, which is the primary goal.

Accuracy measures the overall correctness of the model (true positives + true negatives) divided by total cases. Given the low prevalence of fraud (2%), a model could achieve high accuracy by predicting most transactions as non-fraudulent, but this would fail to capture fraudulent cases effectively.

The F1 score is the harmonic mean of precision and recall, providing a balance between the two. It is especially useful in imbalanced classes like fraud detection, where maximizing the identification of fraudulent transactions (recall) while maintaining reasonable precision is crucial.

True positive rate (sensitivity or recall) measures the proportion of actual positives (fraudulent transactions) that are correctly identified. In this scenario, optimizing for the true positive rate is critical to capturing as many fraudulent transactions as possible.

Previous Post: Which system architecture should the solutions architect recommend?

Next Post: Which design principle of the AWS Well-Architected Framework is the company following with this practice?

Leave a Reply Cancel reply