Solved – Statistically comparing classifiers using only confusion matrix (or average accuracies)

cross-validationmachine learningmodel selectionstatistical significance

Is it possible to perform a statistical test to determine if one classifier is better than the other using only the confusion matrices of these classifiers?

What about the average accuracies from k-fold cross validation?

I have a number confusion matrices and average accuracy for classifiers obtained through k-fold cross validation (done using RapidMiner). The data sets for these classifiers are all the same, though the splitting into folds was all done independently. What I'd like to do is, given two of these classifiers, A and B, be able to test if A is statistically better than B using only the confusion matrices and/or the average accuracies for classifiers A and B.

All the statistical tests I've found so far require knowing the number of samples that A classified correctly when B did not, and vice-versa. (McNemar's Test for example). I can generate this data if necessary, but I'd like to avoid it if reasonably possible.

Best Answer

You want to test whether $$p_A-p_B>0,$$ where $$p_A, p_B$$ are the accuracies of the classifiers. To test this, you need an estimate of $$p_A-p_B$$ and $$Var(p_A-p_B)=Var(p_A)+Var(p_B)-2Cov(p_A,p_B)$$. Without knowing the samples that each classifier gets right/wrong, you won't be able to estimate the covariance, thus you can't statistically compare the classifiers.

Related Question