Solved – C-index for survival analysis with confidence interval (R)

confidence intervalcox-modelrsurvival

I have tried now several packages in R for Cox proportional hazard regression and I am having a hard time to find a function where I can fit a model using cross validation/bootstrapping and obtain an estimate of the C-index based on the untrained items with a confidence interval.

The PEC package comes close, however it does not provide a confidence interval. Does anybody know a package which does provide it? Or a formula to calculate it myself?

Best Answer

The simplest way to get empirically based confidence intervals is bootstrapping. If you do 999 bootstraps and rank your statistic of interest in ascending order, then the 25th and the 975th provide a simple estimate of 95% confidence intervals. I don't know of any advantages of cross-validation for this purpose, and you would have to do multiple cross-validations to get that good an estimate in any event. Bootstrapping has the advantage to being as close as you can get to resampling from the original population.

So if you already have a program that produces your statistic of interest from a set of bootstrap samples (like validate() from the rms package, which works on the output from the package's cph() function), that's the easiest way to proceed. Functions like validate() know how to keep track of parameter estimates from the original model separately from those based on the bootstrap resamples, which seems to be of some concern to you.*

There are other ways to estimate CIs from bootstraps. Bootstrapping Regression Models in R shows how to use facilities provided by R packages to get such estimates. It provides the approach for calculating the potentially better bias-corrected, accelerated (or $BC_a$) percentile intervals.

The boot package can automate this process in general. First you write a function that produces your statistic of interest from a dataset (first argument to your function) based on a supplied vector of indices (second argument). There's a simple example of the structure of such a function on page 194 of ISLR, to augment the somewhat more opaque examples on the boot() manual page. Then the boot() function does the resampling of the cases, passing the vector of chosen indices each time to your function to calculate your statistic. You can use the boot.ci() function on the output from boot() to get 5 different types of CI estimates, including $BC_a$.

Finally, with respect to the C-index in particular, pay attention to how you intend to use it. It's not, for example, a very sensitive test for model selection according to Frank Harrell, who developed that index and also is responsible for the rms package.


*If you have questions about how the calculations are performed you can get the code for functions written in R by typing the function name at the R prompt, sometimes preceded by the package name. After the rms package is loaded, type rms:::validate.cph to get its code, and predab.resample to see the workhorse function that does most of the work.

Related Question