Solved – Can someone sort me out regarding the calculation of AUC

aucclassification

I am having some trouble with two different implementations of a classification problem giving different results. Me and my college who did the other implementation has narrowed the problem down to the way we calculate the area under the receiver operating characteristic curve (AUC). One solution is derived from a formula appearing at least at one location: [1]

$$AUC_1 = \frac{1}{mn}\sum_{i=1}^{m}\sum_{j=1}^{n}\mathbf{1}_{p_i<p_j}$$

I have ported the implementation into R and compared it with the result $AUC_2$ from the R-package pROC:

auc1 <- function(p) {
  values <- p[,1]
  positives <- values[ p[,2]==1 ]
  negatives <- values[ p[,2]==0 ]
  count <- 0
  for ( p in positives ) {
    for ( n in negatives ) {
      if ( p>n ) {
        count <- count + 1
      }
    }
  }
  return(count/(length(positives) * length(negatives)))
}

auc2 <- function(p) {
  library(pROC)
  c <- roc(p[,2], p[,1], print.auc=TRUE, ci=F, of="auc")
  return (auc(c))
}

predicted <- c(0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1)
real      <- c(0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1)
p <- cbind(predicted, real)

auc1(p)
auc2(p)

Can someone shed some light on why I get:

$$AUC_1 = 0.9090909 \mbox{ and } AUC_2 = 0.9545\mbox{?}$$

Best Answer

Go back to basics. The AUROC is mainly a good measure because of a coincidence: it equals the concordance probability ($c$-index; $U$-statistic) commonly used in rank correlation measures and the Wilcoxon-Mann-Whitney statistic. Concordance is an excellent measure of separation or predictive discrimination. So calculate it efficiently:

require(Hmisc)
somers2(predicted, real)
         C        Dxy          n    Missing 
 0.9545455  0.9090909 20.0000000  0.0000000 

The efficient calculation is essentially a one-liner in somers2:

c.index <- (mean(rank(x)[y == 1]) - (n1 + 1)/2)/(n - n1)

But be clear on why you are using AUROC in the first place. It is a nice supplement to log-likelihood-based measures but not a substitute for the gold standard.