Solved – Confidence intervals on a cutpoint of ROC curve

bootstrapconfidence intervalrroc

Can one estimate and calculate a confidence interval for the value of a cutpoint obtained from the ROC curve?

for example, using the pROC package in R:

> library(pROC)
> data(aSAH)
> roc1 <- roc(aSAH$outcome,
            aSAH$s100b, percent=TRUE,
            # arguments for ci
            ci=TRUE, boot.n=100, ci.alpha=0.9, stratified=FALSE,
            # arguments for plot
            plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE,
            print.auc=TRUE, show.thres=TRUE)

with confidence intervals:

> ci.thresholds(roc1)

will produce:

95% CI (2000 stratified bootstrap replicates):
 thresholds  sp.low sp.median sp.high se.low se.median se.high
       -Inf   0.000      0.00    0.00 100.00    100.00  100.00
      0.065   6.944     13.89   22.22  92.68     97.56  100.00
      0.075  12.500     22.22   31.94  80.49     90.24   97.56
      0.085  20.830     30.56   41.67  77.99     87.80   97.56
      0.095  27.780     38.89   50.00  70.73     82.93   92.68
      0.105  37.500     48.61   59.72  65.85     78.05   90.24
      0.115  43.060     54.17   65.28  60.98     75.61   87.80
      0.135  47.220     58.33   69.44  53.66     68.29   80.49
      0.155  58.330     69.44   80.56  51.22     65.85   80.49
      0.205  70.830     80.56   88.89  48.78     63.41   78.05
      0.245  73.580     81.94   90.28  43.90     58.54   73.17
      0.290  73.610     83.33   91.67  34.15     51.22   65.85
      0.325  76.350     84.72   93.06  29.27     46.34   60.98
      0.345  79.170     87.50   94.44  29.27     43.90   58.54
      0.395  80.560     88.89   95.83  26.83     41.46   56.10
      0.435  83.330     90.28   95.87  24.39     39.02   53.66
      0.475  90.280     95.83  100.00  19.51     34.15   48.78
      0.485  93.060     97.22  100.00  17.07     31.71   46.34
      0.510 100.000    100.00  100.00  14.63     29.27   43.90

QUESTION

Why there is no CI on thresholds?

UPDATE

I realised how to specify the best cutpoint to be not youden, but topleft?

rocobj <- plot.roc(aSAH$outcome, 
                   aSAH$s100b,  
                   main="Confidence intervals", 
                   percent=TRUE,  ci=TRUE, print.auc=TRUE) 
# print the AUC (will contain the CI)  
ciobj <- ci.se(rocobj, 
               specificities=seq(0, 100, 5)) 
plot(ciobj, type="shape", col="#1c61b6AA")
plot(ci(rocobj, of="thresholds", thresholds="best", best.method="topleft")) 

Best Answer

You fixed the thresholds. They cannot vary in the boostrap.

Let's simplify and look at only one threshold:

> ci.thresholds(roc1, thresholds = 0.205)
95% CI (2000 stratified bootstrap replicates):
 thresholds sp.low sp.median sp.high se.low se.median se.high
      0.205  70.83     80.56   88.89  48.78     63.41   78.05

By doing that, you fixed the threshold to 0.205 and asked: how much can my sensibility and specificity vary at that threshold? The threshold is your fixed point. It has no uncertainty associated with it. You implicitly did the same by asking for all thresholds, even though you didn't spell it out.

If you want a CI on the threshold you have to reformulate the question, for instance how uncertain is the "best" threshold?

> ci.coords(roc1, x="best", ret="threshold")
95% CI (2000 stratified bootstrap replicates):
                           2.5%   50% 97.5%
threshold best: threshold 0.155 0.205 0.485

Now of course you could ask: let's now fix the sensitivity at X = 0.9, and see how the threshold varies there. But this is more tricky than it sounds: you see, most of the ROC curve is actually a line between discrete points. Most points on the ROC curve actually fall between two thresholds. To calculate a threshold for an arbitrary point, you would need to interpolate a threshold value there. This is doable, but requires some parametric assumptions about the distribution of thresholds. Are you doing a linear interpolation? That would be pretty bad for most datasets I've ever seen.

So far, pROC is not able to reliably calculate a threshold at an arbitrary coordinate, so questions like how uncertain is the threshold at a fixed sensitivity X? result in errors or missing values. It's a can of worms I tried to open but I never reached anything useful.

Related Question