P-Value Definition – Definition of P-Value in Caret’s Confusion Matrix Method

confusion matrixp-valuer

In the documentation for the confusion matrix method in the caret package, the p-value is described as:

a one-sided test to see if the accuracy is better than the "no information
rate," which is taken to be the largest class percentage in the data.

But what precisely is "a one sided test" here? I am assuming some p-value test between the accuracy and NIR. Specifically, what does "better than the NIR" mean?

Moreover, am I correct to assume that "the largest class percentage in the data" could be computed as number of tuples with most common class label / total number of tuples?

Best Answer

If you have a class imbalance, you might want to know if your model's accuracy is better than the proportion of the data with the majority class. So if you have two classes and 70% of your data are class #1, is an accuracy of 75% any better than the "non-information rate" of 70%.

confusionMatrix uses the binom.test function to test that the accuracy (a proportion) is better than the no-information rate. It is one-side since you probably only care about being better than chance.

Max

Related Question