Solved – Statistics for Area under the ROC curve

auccomputational-statisticsconfidence intervalroc

I have a question regarding statistical evaluation of the AUC.
In their paper (http://www.jstor.org/stable/2531595), DeLong et al. describe a method to evaluate AUC curves. (Another good explanation can be found in the book "Statistics with Confidence: Confidence Intervals and Statistical Guidelines" by Altman et al.).

As far as I understood, we compute the $\text{AUC}$ and the standard deviation $\sigma$ of the Kernel matrix. Assuming the normal distribution $\mathcal{N}(\text{AUC},\sigma)$ it is possible to compute confidence intervals.

My question is about the normality assumption:

  1. The $\text{AUC}$ usually lies in the interval $[0,1]$ but the interval for the normal distribtion is $(-Inf, Inf)$. Is this problem really negligible? (This problem e.g. is solved in pROC package by just restricting the CI to $[0,1]$)

  2. The $Beta$ distribution is defined on the interval $[0,1]$ and has the shape parameters $\alpha$ and $\beta$. Can we estimate them given the data like we are able to do it for the AUC?

To give an example: Given a vector c(T,F,F,F,T,F,F,T,F,F) the $\text{AUC} = 0.619$ and $\sigma = 0.237$ which results in 95% CI $(0.156, 1.083)$.

library(pROC)
temp.in <- c(T,F,F,F,T,F,F,T,F,F)
pROC::auc(pROC::roc(controls=which(temp.in), cases=which(!temp.in)))
pROC::ci.auc(pROC::roc(controls=which(temp.in), cases=which(!temp.in)))

Intead of using the normal distribution I would like to use the $Beta$ distribution. But how we can estimate $\alpha$ and $\beta$ for $Beta$ distribution given c(T,F,F,F,T,F,F,T,F,F)?

Best Answer

An alternative given by [1] is to compute the interval for the logit AUC:

$ log \left( \frac{AUC}{1-AUC} \right) \pm \phi ^{-1} \left( 1 - \frac{\alpha}{2} \right) \frac{\sqrt{AUC}}{AUC(1 - AUC)} $

so that you get an asymmetric interval. In your case, you would get a 95% CI $(0.38, 0.81)$.

If you are frequently dealing with high AUCs and small sample sizes, you may want to have a look at [2] that shows there is no single method that can optimally compute confidence interval for all ROC curves.


[1] Pepe MS, The Statistical Evaluation of Medical Tests for Classification and Prediction, OUP 2003, p. 107

[2] Obuchowski NA, Lieber ML, Confidence bounds when the estimated ROC area is 1.0, Acad Radiol. 2002, 9 (5) p. 526-30