Solved – the benefit of Matthews correlation coefficient (MCC) over average of sensitivity and specificity for imbalanced data

binary dataclassificationunbalanced-classes

Assume that we have a 2 class data where number of samples in each class are very different (data is imbalanced) and penalty of miss-classification is equal for the two classes. For assessing the performance of a binary classifier over this data,
I like to know what is the benefit(s) of Matthews correlation coefficient (MCC) over just averaging sensitivity (True positive rate) and specificity (True negative rate)?

Best Answer

MCC is a correlation coefficient between target and predictions. It generally varies between -1 and +1. -1 when there is perfect disagreement between actuals and prediction, 1 when there is a perfect agreement between actuals and predictions. 0 when the prediction may as well be random with respect to the actuals. As it involves values of all the four quardants of a confusion matrix, it is considered as a balanced measure.

Let's consider a case where the number of cases for either positive or negative is too low and the classifier returns either of the TP or TN as 0, then averaging TPR and TNR will return a score without any direction.We cannot judge a model based on this score. MCC involves values of all the four quardants of a confusion matrix and being a balanced measure will return a value with a direction (+ve and -ve)

There is a great explanation on MCC and accuracy measures in the below article

https://lettier.github.io/posts/2016-08-05-matthews-correlation-coefficient.html