Which metrics should the data scientist use to optimize the model?

By: study aws cloud

On: January 11, 2025

Tagged: Machine Learning Specialty

With: 0 Comments

A financial company is trying to detect credit card fraud.The company observed that, on average, 2% of credit card transactions were fraudulent.A data scientist trained a classifier on a year’s worth of credit card transactions data.The model needs to identify the fraudulent transactions (positives) from the regular ones(negatives).The company’s goal is to accurately capture as many positives as possible.

Which metrics should the data scientist use to optimize the model?

(Choose two.)

Specificity

False positive rate

Accuracy

Area under the precision-recall curve

True positive rate

Explanations:

Specificity measures the ability of the model to identify true negatives, which is not the priority when trying to capture as many positives as possible.

The false positive rate is a measure of how often negatives are incorrectly classified as positives, which is less relevant when the focus is on detecting positives.

Accuracy is not ideal for imbalanced classes, as it may be high even if the model fails to capture most of the fraudulent transactions (positives).

The area under the precision-recall curve is a good metric for evaluating models in imbalanced datasets, as it focuses on the model’s performance in detecting positives.

The true positive rate (also known as recall or sensitivity) measures the proportion of actual positives that are correctly identified, which aligns with the goal of capturing as many fraudulent transactions as possible.

Previous Post: Which solution meets these requirements and is the MOST operationally efficient?

Next Post: Will the console allow the user to delete the VPC?

Leave a Reply Cancel reply