There are many metrics to evaluate the performance of predictive model. Many of these appear relatively straightforward to me (e.g. Accuracy, Kappa, AUC-ROC, etc.) but I am uncertain regarding the McNemar test. Could someone kindly help me understand the interpretation of the McNemar Test on a predictive model contingency table? This is applied and the P-Value returned from the R function caret::confusionMatrix
. Everything I read about McNemar talks about comparing between before and after a 'treatment'. In this case, I would be comparing predicted classes vs. the known test classes. Am I correct to interpret a significant McNemar test to mean that the proportion of classes is different between the testing classes and the predicted classes?
A second, but more general, followup question would be how should this factor in to interpreting the performance of a predictive model? For example, as reflected in the 1st example below, in some circumstances 75% accuracy may be considered great but the proportion of predicted classes may be different (assuming my understanding of a significant McNemar test is accurate). How would one approach such a circumstance?
Lastly, does this interpretation change if more classes or involved? For example a contingency matrix of 3×3 or larger.
Providing some reproducible examples mirrored from here:
#significant p-value
mat <- matrix(c(661,36,246,207), nrow=2)
caret::confusionMatrix(as.table(mat))
> caret::confusionMatrix(as.table(mat))
Confusion Matrix and Statistics
A B
A 661 246
B 36 207
Accuracy : 0.7548
95% CI : (0.7289, 0.7794)
No Information Rate : 0.6061
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.4411
Mcnemar's Test P-Value : < 2.2e-16
... truncated
# non-significant p-value
mat <- matrix(c(663,46,34,407), nrow=2)
caret::confusionMatrix(as.table(mat))
Confusion Matrix and Statistics
A B
A 663 34
B 46 407
Accuracy : 0.9304
95% CI : (0.9142, 0.9445)
No Information Rate : 0.6165
P-Value [Acc > NIR] : <2e-16
Kappa : 0.8536
Mcnemar's Test P-Value : 0.2188
... truncated
Best Answer
Interpret the McNemar’s Test for Classifiers
McNemar’s Test captures the errors made by both models. Specifically, the No/Yes and Yes/No (A/B and B/A in your case) cells in the confusion matrix. The test checks if there is a significant difference between the counts in these two cells. That is all.
If these cells have counts that are similar, it shows us that both models make errors in much the same proportion, just on different instances of the test set. In this case, the result of the test would not be significant and the null hypothesis would not be rejected.
More information can be found out here:
https://machinelearningmastery.com/mcnemars-test-for-machine-learning/