I have tried now several packages in R for Cox proportional hazard regression and I am having a hard time to find a function where I can fit a model using cross validation/bootstrapping and obtain an estimate of the C-index based on the untrained items with a confidence interval.
The PEC package comes close, however it does not provide a confidence interval. Does anybody know a package which does provide it? Or a formula to calculate it myself?
Best Answer
The simplest way to get empirically based confidence intervals is bootstrapping. If you do 999 bootstraps and rank your statistic of interest in ascending order, then the 25th and the 975th provide a simple estimate of 95% confidence intervals. I don't know of any advantages of cross-validation for this purpose, and you would have to do multiple cross-validations to get that good an estimate in any event. Bootstrapping has the advantage to being as close as you can get to resampling from the original population.
So if you already have a program that produces your statistic of interest from a set of bootstrap samples (like
validate()
from therms
package, which works on the output from the package'scph()
function), that's the easiest way to proceed. Functions likevalidate()
know how to keep track of parameter estimates from the original model separately from those based on the bootstrap resamples, which seems to be of some concern to you.*There are other ways to estimate CIs from bootstraps. Bootstrapping Regression Models in R shows how to use facilities provided by R packages to get such estimates. It provides the approach for calculating the potentially better bias-corrected, accelerated (or $BC_a$) percentile intervals.
The
boot
package can automate this process in general. First you write a function that produces your statistic of interest from a dataset (first argument to your function) based on a supplied vector of indices (second argument). There's a simple example of the structure of such a function on page 194 of ISLR, to augment the somewhat more opaque examples on theboot()
manual page. Then theboot()
function does the resampling of the cases, passing the vector of chosen indices each time to your function to calculate your statistic. You can use theboot.ci()
function on the output fromboot()
to get 5 different types of CI estimates, including $BC_a$.Finally, with respect to the C-index in particular, pay attention to how you intend to use it. It's not, for example, a very sensitive test for model selection according to Frank Harrell, who developed that index and also is responsible for the
rms
package.*If you have questions about how the calculations are performed you can get the code for functions written in R by typing the function name at the R prompt, sometimes preceded by the package name. After the
rms
package is loaded, typerms:::validate.cph
to get its code, andpredab.resample
to see the workhorse function that does most of the work.