Solved – Which metrics to focus on classification problem with imbalanced classes

classificationunbalanced-classes

Currently, I need to work on a binary classification project (e-commerce), where the buyer class (class 1) is a minority class and is the class of interest. Building a classification model has two types of error;

  1. predicting as a non-buyer a buyer
  2. predicting as a buyer a non-buyer.

For the minority class class 1:

FP means we have predicted 1, while the actual class is 0, i.e. we classify a non-buyer as a buyer (precision gets lower).

FN means that we have predicted 0, while the actual class is 1, i.e. we miss an actual buyer (recall gets lower).

My question is what metrics I should use to assess different models. Apparently, accuracy is not a very good metric, due to imbalanced classes.

For both classes I will get some metrics, i.e. precision, recall and f1-socre. So where should I focus more?

P.S. Given extra information, that a non-buyer has a small cost and a buyer has some good revenue, would it change the metrics I should focus on?

Best Answer

For both classes I will get some metrics, i.e. precision, recall and f1-socre. So where should I focus more?

For binary classification, we usually have one precision and recall / F1 score. and it is usually on "positive" class. Trying to target on improving F1 will be a good practice.

Cohen's Kappa is another good metric for imbalanced problem. Details can be found in this post.

Cohen's kappa in plain English