Solved – testing equivalence for two independent AUC

auc

First of all, sorry for the "silly" question. I have two AUC, the first one comes from a training set and the other one comes from a validation set. I am using the roc.test function from pROC R package to calculate if both AUC are equivalent.

The output of my test is the following:

> roc.test(roccurve, roccurve2, paired=FALSE)

DeLong's test for two ROC curves

data:  roccurve and roccurve2
D = -2.0924, df = 620.43, p-value = 0.03681
alternative hypothesis: true difference in AUC is not equal to 0
sample estimates:
AUC of roc1 AUC of roc2 
0.7239011   0.8135604

With a pvalue higher than 0.05 I assume that both AUC are not equivalent saying that the model works better in the validation set than in the training set. However, I have read that:
A hypothesis test for the difference in AUC can test equality, equivalence, or non-inferiority of the diagnostic tests.

Inferences about the difference between AUC's are made using a Z test. The three hypotheses of interest are:
Equality.
The null hypothesis states that the difference is equal to a specified value (usually 0), against the alternative hypothesis that it is not equal to the specified value (usually 0). When the test p-value is small you can reject the null hypothesis and conclude that the difference is not equal to the specified value (usually 0, the tests are different).

It is important to remember that a statistically significant p-value tells you nothing about the practical importance of what was observed. For a large sample, the difference for a statistically significant hypothesis test may be so small as to be practically useless. Conversely, although there may some evidence of a difference, the sample size may be too small for the test to reach statistical significance, and you may miss an opportunity to discover a true meaningful difference.

Equivalence.
The null hypothesis states that the difference is less than a lower bound of practical equivalence or greater than an upper bound of practical equivalence, against the alternative hypothesis that the difference is within an interval considered practically equivalent. When the test p-value is small you can reject the null hypothesis and conclude that the tests are equivalent.

Non-inferiority
The null hypothesis states that the difference from a standard test is greater than the smallest practical difference against the alternative hypothesis that the difference from the standard test is less than the smallest practical difference. When the test p-value is small you can reject the null hypothesis and conclude that the test is not inferior to the standard test.

Therefore, would the pvalue below 0.05 mean that both AUC are not equal but they are equivalent?

Thanks

Best Answer

It means that your AUC are different. As you don't have defined an equivalence region, you cannot say anything about equivalence anyway.

Best Answer

Related Solutions

Solved – AUC values for different sets of features

Calculation:

Complete explanation:

Left Tail

Left tail hypothesis testing is illustrated below:

Left tail hypothesis testing

Right Tail

Right tail hypothesis testing is illustrated below:

Right tail hypothesis testing

Two Tail

Two tail hypothesis testing is illustrated below:

Two tail hypothesis testing

AUC and Class Imbalance – How It Affects Training/Test Dataset

Related Question