Solved – Which standard error formula for the area under the ROC curve should I use

classificationhypothesis testingroc

I am trying to assess the performance of a procedure in identifying something abnormal in the cities of a particular state in USA. I tested this procedure on six different cities with different positive and negative prevalence for each city. I've already finished the step of constructing the ROC curve and the nonparametric AUC for each of the six cities. I am now trying to consider that those six cities as a sample taken from that particular state. Therefore, I need to express the AUC for the procedure in terms of its confidence level. In this case, I have six values for AUC (n=6) and would like to use 95% confidence.The question now is: should I use the conventional t-distribution (for n<30) to calculate the intervals of AUC, or the James Hanley method?

If I am using Hanley method, I need to calculate the standard error for each case of the six cities since the neg. and pos. is different, then I will end up with six different values for the standard error. If using the conventional t-distribution, one standard error value will be calculated, which is supposed to. Please help

Best Answer

I think you will have trouble convincing others that any particular 6 cities represent a representative sample of all USA cities. I answer in the expectation that this is the beginning of a larger-scale analysis based on many more cities.

First, although the Hanley-McNeil paper you cite is a useful introduction to principles of ROC and AUC, there are some statistical considerations that have become better appreciated in the intervening 30+ years. In particular, AUC values based on a particular data set tend to overestimate the true classification performance when applied to another data set or even to another sample from the same population. There are well established tools for estimating this "optimism" in AUC values, which have the side effect of providing "bootstrap" estimates of the standard errors of the AUC values that you desire. The rms package in R is one example; see this page for a recent discussion on this site and links to further information.

Your project, however, may be better served by combining data from all of your cities into a single analysis rather than doing individual analyses for each of the cities and then pooling. You don't provide many details about your analysis approach leading to the ROC/AUC, but there are 2 general possibilities: either the relations of all predictor variables to the +/- classification are the same from city to city, or some aren't.

If all the relations of predictors to +/- classification are independent of city, then you lose nothing by including all cases together in a single analysis, and your estimates will be more precise due to the larger number of cases.

If you suspect that the coefficients relating some predictors to the +/- classification will differ among cities, then you should be able to include that possibility in your analysis by treating those coefficients as random effects, associated with the particular cities, in a combined single model drawing on all your data together. Such a "mixed effects" model will provide information about the inter-city differences in such coefficients, and after optimism/bootstrap analysis will provide a useful gauge of the expected error in your classification scheme.