Solved – Denominator is Zero for Matthews correlation coefficient and F-measure

classificationunbalanced-classes

Recently, I built a classification model based on the imbalanced data set(positive sample is minority and negative sample is majority), and the model gave the following result for the test set:

True Positives = 0

True Negatives = 139

False Positives = 0

False Negatives = 10.

My question is: for the result, can Matthews correlation coefficient (MCC ) and F-measure be used for estimating the classifier?

Since the denominators for MCC and F-measure are zero, it seems meaningless. If so, MCC and F-measure is not always works for estimating the classifier, and sensitivity and specificity as well as g-mean should be better. Is that right?

Any help is appreciated.

Best Answer

This is only really a problem if you compute the precision and recall first, then plug them in.

One can also compute the $F_1$ score as $$F_1 = \frac{2 \cdot \textrm{True Positive}}{2 \cdot \textrm{True Positive} + \textrm{False Positive} + \textrm{False Negative}}$$

Plugging in your numbers, you'll arrive at an $F_1$score of zero, which seems appropriate since your classifier is just guessing the majority class.

There is an information-theoretic measure called proficiency that might be of interest if you are working on fairly unbalanced data sets. The idea is that you want it to remain sensitive to both classes as either the number of true positives or negatives approaches zero. It's essentially $$ \frac{I(\textrm{predicted labels}; \textrm{actual labels})}{H(\textrm{actual labels)}}$$

See pages 5--7 of White et al. (2004) for more details about its calculation and interpretation