Solved – Measuring Statistical Significance of Binary Classification using Matthews Correlation Coefficient

correlationmachine learningmathematical-statisticsstatistical significance

Based on the following relationship between Matthew's Correlation Coefficient (MCC) and Chi Square:

enter image description here

(MCC is the Pearson product-moment Correlation Coefficient)

Is it possible to conclude that:

By having:

Imbalanced Binary Classification Problem, N = 1000, and χ² >= 3.85 (p < 0.05, df = 1)

  1. Following MCC is significant:

      MCC >= sqrt ( 3.85 / 1000 ) which is MCC >= 0.06
    
  2. When comparing two algorithms (A, B) with trials of 100 times:

    IF mean (MCC_A1..MCC_A100) – mean(MCC_B2..MCC_B100) > 0.06 THEN:

      A significantly outperforms B
    

Thanks in advance!

Edit ROC curves provide an overly optimistic view of the performance for imbalanced binary classification

Regarding Threshold, I'm not a big fan of not using it, as finally one have to decide for a threshold, and quite frankly, that person has no more information than me to decide upon.
Hence, providing PR or ROC curves are just for the sake of circumventing the problem for publishing.

Best Answer

I am not sure whether your question is entirely correct. The Matthews Correlation Coefficient allows you to evaluate the performance of a single classifier. The closer the value of the coefficient to 1, the better. A value close to one, means that your classifier behaves nearly randomly (i.e. it would be like tossing a fair coin).

Now, if you want to compare two classifiers, you could compare their respective Matthews Correlation Coefficients. But that is also problematic because, at least in general, you use a threshold in your algorithm in order to make decisions, does this sample belongs to class 1 or 2?. This threshold allows you to control the true positive and false positive rates, i.e. how tolerant are you against mistakes. Thus it is in general preferred to calculate the ROC curve of both classifiers.

Another possibility would be to perform the McNemar's test.