Cox Model – Application of Multiple Imputation Explained

cox-modelmultiple-imputationsurvival

I'm struggling when trying to understand some aspects of multiple imputation when intending to do Cox regressions with my data.

First of all, my dataset is not adapted for survival analysis yet. I know what my time_to_event variable will be, and I know what my event indicator variable will be, but they are not coded yet.

I read in most places that multiple imputation is better done on raw data. Then, my dataset pre-modifications for survival analysis would be the most suited for multiple imputation.

But, I also read that when doing multiple imputation with the intention of doing Cox regressions, you should include in your predictors your Nelson Aalen cumulative hazard value and your binary event/censor variable. In order to do this, I would need to adapt my data for survival analysis beforehand.

My first question is: Which is the best and most reliable option between the two?

If the answer is the latter, this adds more questions. If I create my event indicator and time_to_event variables using the diagnosis age of a disease (e.g. age of diagnosis is present= event happened) should the diagnosis age variable be removed before imputation since it's highly correlated with the variable I just created?

Thank you.

Best Answer

Coding your data so that it is suitable for survival analysis is not a problem here. You will still have "raw data"; the only "modification" is imposing on the data your choice of a time reference (e.g., does time = 0 represent birth date or study-entry date) for the survival function. That choice will then define the Nelson-Aalen estimate to include in the imputation scheme.

Stef van Buuren outlines considerations for imputation of survival data in Flexible Imputation of Missing Data. Having correlated predictors upon which to base imputations isn't necessarily a problem, but you should evaluate the imputation scheme to make sure that it is appropriate for your data, based on your understanding of the subject matter. Simply accepting defaults isn't the best approach.

Related Question