Solved – Relationship between AUC and U Mann-Whitney statistic

aucrocwilcoxon-mann-whitney-test

Recently I learned about the relationship between Area Under (ROC) Curve and $U$ statistic of the Wilcoxon-Mann-Whitney test. It is supposed to follow the following rule (got it from this nice post on quora: https://www.quora.com/How-is-statistical-significance-determined-for-ROC-curves-and-AUC-values):

$$AUC = \frac{U}{n_1n_2}$$

It looks convincing, but I made some checks on real data in R and I found that, indeed, there is a functional relationship between $U$ and $AUC$, but it has slightly different form:

$$AUC = 1 – \frac{U}{n_1n_2}$$

Unfortunately I cannot share the real data I used, but here is a simple simulation that proves the point:

library(PredictABEL)
set.seed(303)
x1 <- rnorm(40, 20, 4)
x2 <- rnorm(50, 30, 10)
y <- c(rep("a", 40), rep("b", 50))
df <- data.frame(x=c(x1, x2), y=y)
mod <- glm(y ~ x, data=df, family=binomial)
plotROC(df, 2, mod$fitted.values)       # AUC = 0.81
auc <- 0.81
utest <- wilcox.test(x ~ y, data=df)
utest$statistic / prod(table(df$y))     #  = 0.19
1 - utest$statistic / prod(table(df$y)) #  = 0.81 = AUC

So, as you see I am a bit confused. I am pretty sure that this whole confusion is only due to the fact that I am overlooking something important, but that's why I will be really thankful if someone could shed some light on it for me.

EDIT:
So the question is which of the two formulas is correct? Every source I check claims that the first one but the data I checked suggest that the second one.

Best Answer

Ok, I found the answer and as I expected it is trivial. $U$ test statistic value depends on the group it is calculated for (it does not affect the test result in anyway). In the code I wrote the test statistic was computed as a measure of support for the hypothesis that the group with the smaller mean dominates the group with the higher mean, which is of course not true, so that's why $U$ was small.

So after switching the direction of the comparison and making the hypothesis tested by the Wilcoxon-Mann-Whitney test to one checking whether the group with the higher mean dominates the one with the lower, which is true, I got the correct relationship between $U$ and $AUC$ (that is $AUC = \frac{U}{n_1n_2}$). So everything is correct.

Related Question