Survival – Combining Adjusted Survival Estimates with Multiple Imputation

multiple-imputationsurvival

I've constructed a Cox PH model using multiple imputed datasets in SAS. Now I would like to estimate adjusted survival curves for each treatment group (main variable in the model). Is there a principled way of combining adjusted survival curves estimated from each imputation? Can I just estimate the probabilities from each imputation and then combine them later using PROC MIANALYZE?

Thanks!

*Edit: I've sort of rigged the pooling of multiple imputations so that the modeleffects argument in PROC MIANALYZE is the estimated survival probability for group A or B at a single timepoint. I have several of these "effects" based on the number of timepoints. The stderr argument in PROC MIANALYZE is the associated standard error. My concern is if doing it this way still leads to valid inferences.

Best Answer

With the same predictors in all of your models, the basic rule for the pooling step of multiple imputation is:

The pooling step consists of computing the mean over the m repeated analysis, its variance, and its confidence interval or P value.

So in your case, the "mean" etc. would be for each regression coefficient in the Cox models. (It's not clear in your question what you mean by "estimate the probabilities from each imputation.") Calculating the p-values properly might require information about covariances.

I don't use SAS, but a quick look at its manual page suggests that PROC MIANALYZE is designed to do what you want. It would seem to work best if you do the analyses in a way that maintains the special data-set structures that SAS uses to keep track of covariances in multiple imputations. (That's certainly the case for the mice package in R for multiple imputation.)

EDIT: In response to the revised question, an important issue would be the distributions of survival probabilities among the imputations, as they might not be as amenable to pooling among imputed models as the regression coefficients are. For example, if some probabilities are very close to 100% or 0% then simply taking the average of the probabilities is likely to lead to trouble, particularly in calculating the p-values for differences between groups. You will be better off by following the standard imputation procedure for pooling multiple models into a single model, and then getting the estimates of survival probabilities from that single pooled model.

Related Question