Solved – the best way to calculate the AUC of a ROC curve

rrocsensitivity-specificity

I have a ROC curve for which I'd like to calculate the AUC. I'm getting different values using the trapezoidal and rank-based approaches. What I'm noticing is that the two values actually add to 1.0 and the ROC curve itself suggests that the value from the trapezoidal rule is correct. Ideas on what's going on?

Here's an example dataset (with code lifted from How to calculate Area Under the Curve (AUC), or the c-statistic, by hand)…

norm <- c(0.184, 0.250, 0.462, 0.424, 0.436, 0.136, 0.078, 0.166, 0.042, 0.542, 0.274, 0.130, 0.210, 0.364, 0.276, 0.262, 0.284, 0.138, 0.242, 0.092, 0.104, 0.070, 0.260, 0.320, 0.342, 0.168, 0.108, 0.068, 0.060, 0.220, 0.038, 0.090, 0.096, 0.480, 0.424, 0.060, 0.394, 0.226, 0.056, 0.250, 0.122, 0.532, 0.460, 0.088, 0.470, 0.070, 0.480, 0.216, 0.098, 0.586, 0.154, 0.620, 0.094, 0.534, 0.070, 0.240, 0.226, 0.762, 0.110, 0.202, 0.076, 0.436, 0.514, 0.390, 0.254, 0.254, 0.140, 0.192, 0.500, 0.226, 0.690, 0.158, 0.522, 0.306, 0.588, 0.060, 0.130, 0.450, 0.034, 0.280, 0.510, 0.042, 0.256, 0.062, 0.106, 0.104, 0.206, 0.346, 0.036, 0.192, 0.260, 0.212, 0.708, 0.118, 0.398, 0.290, 0.118, 0.532, 0.354, 0.422, 0.540, 0.202, 0.676, 0.544, 0.276, 0.066, 0.764, 0.230, 0.406, 0.572, 0.718, 0.008, 0.188, 0.260, 0.094, 0.406, 0.102, 0.050, 0.358, 0.384, 0.062, 0.298, 0.510, 0.722, 0.264)
abnorm <- c(0.090, 0.330, 0.052, 0.204, 0.376, 0.066, 0.362, 0.320, 0.278, 0.444, 0.504, 0.086, 0.170, 0.394, 0.384, 0.382, 0.152, 0.136, 0.098, 0.092, 0.154, 0.126, 0.502, 0.646, 0.086, 0.260, 0.108, 0.264, 0.246, 0.088, 0.154, 0.166, 0.028, 0.552, 0.218, 0.198, 0.186, 0.212, 0.040, 0.026, 0.110, 0.242, 0.096, 0.434, 0.134, 0.490, 0.302)
wi <- wilcox.test(abnorm,norm))
w <- wi$statistic
w/(length(abnorm)*length(norm))
#        W 
#0.4378723 


tab=as.matrix(table(truestat, testres)) )
tot=colSums(tab)
truepos=unname(rev(cumsum(rev(tab[2,])))) )
falsepos=unname(rev(cumsum(rev(tab[1,])))) )
totpos=sum(tab[2,])
totneg=sum(tab[1,])
sens=truepos/totpos
omspec=falsepos/totneg
sens=c(sens,0)
omspec=c(omspec,0)

height = (sens[-1]+sens[-length(sens)])/2
width = -diff(omspec) # = diff(rev(omspec))
sum(height*width)
# [1] 0.5621277

When I use the ROC R package I get 0.438 and when I use the pROC I get 0.562 – again, these add to 1.0 making me think something weird is going on. I know these are both awful AUC values, but it's a bit disconcerting to see this level of difference.

Best Answer

First of all you didn't state why the ROC curve itself is relevant to the problem at hand. Since ROC curves are inconsistent with individual decision making and are based on backwards-time probabilities, it is hard to think of an example where ROC curves are helpful.

The $c$-index is the accepted nonparametric AUROC estimator. You can get it from the Wilcoxon test as you have done, or more directly using the R Hmisc package somers2 function whose main code is (mean(rank(x)[y == 1]) - (n1 + 1) / 2) / n2.

require(Hmisc)
somers2(c(norm, abnorm), c(rep(0, length(norm)), rep(1, length(abnorm))))

          C         Dxy           n     Missing 
  0.4378723  -0.1242553 172.0000000   0.0000000 

You should be able to replicate this with the use of all possible cutpoints that change sens and spec and the proper use of the trapezoidal rule.

If $Y=1$ is the correct coding for abnorm then your discrimination ability is worse than random guesses.