I was wondering if any of you have experience in fitting a Cox model to data with missing covariates. Do you know of any reference addressing the issues associated with Cox regression with missing covariates? I know of the approach using the Expectation Maximization (EM) algorithm, but I want to know if there is a publication comparing the different methods out there.
Solved – Cox model with missing covariates
cox-modelmissing data
Related Solutions
- Yes, your data format is ok.
- I suggest two methods. 1) If value of Glu or SBP is absent at the measurement, use the most recent available one. Then, perform time-dependent Cox regression.
Example based on your sample table)
Row| ID Start Stop Status Age Sex Supp SuppSum frac Glu SBP
--------------------------------------------------------------------
1 | 1 0 1 0 29 0 1 1 100.0% ... ...
2 | 1 1 2 0 29 0 1 2 100.0% 4.4 121
3 | 1 2 3 0 29 0 1 3 100.0% 4.4 133
4 | 1 3 4 0 29 0 1 4 100.0% 5.0 125
5 | 1 4 5 0 29 0 1 5 100.0% 5.0 125
2) Consider Joint model. Joint model make mixed model for repeatedly measured values, then use it as covariate in Survival model. There is merits for handling missing data and Censoring. Below links provide simple tutorial for Joint model.
https://www.r-bloggers.com/joint-models-for-longitudinal-and-survival-data/
It is uncertain which statistical method should be used on the matched data, as you must respect the matched nature of the new artificial dataset. An equally big problem is that the Cox coefficient for the exposure in the matched dataset has a different meaning and is usually smaller in absolute value than had you done a full conditional-on-covariates analysis, due to non-collapsibility of hazard ratios. By using the propensity analysis you are not respecting outcome heterogeneity that would have been explainable by covariate adjustment.
Think of propensity scores as part of a data reduction strategy (unsupervised learning) that assists in the case where there are too many covariates to adjust for for the number of outcome events (you need at least, say, 4 events per covariate for stable adjustment). I say part because you also need to adjust for the big predictors of outcome which a matched analysis doesn't allow you to do, so covariate adjustment using a spline of the logit of propensity would be preferred to matching if you really, really needed to do data reduction.
Best Answer
Flexible Imputation of Missing Data is an outstanding book by Stef van Buuren that covers this area wll.