I've been researching the mice package, and I haven't yet discovered a way to use the multiple imputations to make a Cox model, then validate that model with the rms package's validate()
function. Here is some sample code of what I have so far, using the data set veteran
:
library(rms)
library(survival)
library(mice)
remove(veteran)
data(veteran)
veteran$trt=factor(veteran$trt,levels=c(1,2))
veteran$prior=factor(veteran$prior,levels=c(0,10))
#Set random data to NA
veteran[sample(137,4),1]=NA
veteran[sample(137,4),2]=NA
veteran[sample(137,4),7]=NA
impvet=mice(veteran)
survmod=with(veteran,Surv(time,status))
#make a CPH for each imputation
for(i in seq(5)){
assign(paste("mod_",i,sep=""),cph(survmod~trt+celltype+karno+age+prior,
data=complete(impvet,i),x=T,y=T))
}
#Now there is a CPH model for mod_1, mod_2, mod_3, mod_4, and mod_5.
Now, if I were just working with one CPH model, I would do this:
validate(mod_1,B=20)
The problem I'm having is how to take the 5 CPH models (1 for each imputation), and be able to create a pooled model that I can then use with rms
. I know that the mice
package has some built-in pooling functions but I don't believe they work with the cph
object in rms
. The key here is being able to still use rms
after pooling. I looked into using Harrell's aregImpute()
function but I'm having some trouble following the examples and documentation; mice
seems simpler to use.
Best Answer
The
fit.mult.impute
function in theHmisc
package will draw imputations created frommice
just as it will fromaregImpute
.cph
will work withfit.mult.impute
. The harder question is how to do validation through resampling when also doing multiple imputation. I don't think anyone has really solved that. I usually take the easy way out and use single imputation to validate the model, using theHmisc transcan
function, but using multiple imputation to fit the final model and to get standard errors.