Solved – Matthews correlation coefficient with multi-class

agreement-statisticsclassificationmachine learningmulti-class

Matthews correlation coefficient ($\textrm{MCC}$) is a measurement to measure the quality of a binary classification ([Wikipedia][1]). $\textrm{MCC}$ formulation is given for binary classification utilizing true positives ($TP$), false positives ($FP$), false negatives ($FN$), and true negatives ($TN$) values as given below:

$$\textrm {MCC} = \frac{TP\times TN – FP\times FN}{\sqrt{\left(TP+FP\right)\left(TP+FN\right)\left(TN+FP\right)\left(TN+FN\right)}}$$

I have a case where I need to classify three different classes, $A$, $B$, and $C$. Can I apply the above formulation to calculate $\textrm{MCC}$ for multi-class case after calculating $TP$, $TN$, $FP$, and $FN$ values for each class as shown below?
$$ TP = TP_A + TP_B + TP_C;\\
TN = TN_A + TN_B + TN_C;\\
FP = FP_A + FP_B + FP_C;\\
FN = FN_A + FN_B + FN_C;
$$

Best Answer

Yes, in general, you can. This approach you want to use is sometimes called "Micro-Averaging": first, sum all TNs, FPs, etc for each class, and then calculate the statistic of interest.

Another way to combine the statistics for individual classes is to use so-called "Macro-Averaging": here you first calculate the statistics for individual classes (A vs not A, B vs not B, etc), and then calculate the average of them.

You may have a look here for some extra details. The page talks about Precision and Recall, but I believe it applies to Matthew's coefficient as well as other statistics based on contingency tables.