Survival Analysis – Drawing Survival Curves for Two Groups After Multiple Imputation

kaplan-meiermultiple-imputationrsurvival

I wonder how to draw survival curves (Kaplan-Meier) when there is no missing information on the survival variables but on the stratification covariate.
For example, we know for all patients the follow-up and the status at the end of the follow-up but there is missing data on the sex variable. And we need to plot the survival curve for men and women.

How to proceed in this case if we use multiple imputation for the stratification covariate?
Should we combine the survival curve of each group using Rubin's rules with log-log transformation?
In fact, it is possible that an individual belongs to group 1 for a multiple imputation and to group 2 the next imputation, but his survival will always be the same.

If anyone knows reference on this particular problematic. Maybe in the Steff van Buuren's book?

(By the way, I am working on R)

Thanks a lot

Best Answer

This Cross Validated page addresses a similar situation. The idea is to pool a characteristic that is close to normally distributed, use Rubin's Rules to pool among the imputed data sets to get means and error estimates, then back-transform to the desired scale for data display. As David Luke Thiessen notes in a comment, Stef van Buuren recommends pooling a complementary log-log transformation for survival probabilities.

I don't see that imputation of the stratification variable adds any further complexity. The paper cited on the above Cross Validated page as an example of using the complementary log-log transformation, Morisot et al., Prostate cancer: net survival and cause-specific survival rates after multiple imputation, BMC Med Res Methodol 15, 54 (2015), imputed causes of death. That would seem to be at least as troublesome as evaluating survival curves after imputing sex in your example.

Related Solutions

Solved – Compare two survival curves for paired data

If you want to compare the model performance of the two survival models, calculation of the C-statistics (Harrell's C, survival ROC...) might be more reasonable approach. Calculate the C-statistics of the two survival model and compare them (p-value can be obtained).

https://rpubs.com/kaz_yos/survival-auc

The link shows various tool for the C-statistics for survival model.

Solved – Comparison of Kaplan-Meier curves across ordered groups

Function comp in survMisc may be close to what you're after. From the docs:

Tests for trend are designed to detect ordered differences in survival curves.
That is, for at least one group: $$S_1(t) \geq S_2(t) \geq ... \geq S_K(t) \quad t \leq \tau$$ where $\tau$ is the largest $t$ where all groups have at least one subject at risk. The null hypothesis is that $$S_1(t) = S_2(t) = ... = S_K(t) \quad t \leq \tau$$ Scores used to construct the test are typically $s = 1,2,...,K$, but may be given as a vector representing a numeric characteristic of the group.
They are calculated by finding: $$Z_j(t_i) = \sum_{t_i \leq \tau} W(t_i)[e_{ji} - n_{ji} \frac{e_i}{n_i}], \quad j=1,2,...,K$$ The test statistic is: $$Z = \frac{ \sum_{j=1}^K s_jZ_j(\tau)}{\sqrt{\sum_{j=1}^K \sum_{g=1}^K s_js_g \sigma_{jg}}}$$ where $\sigma$ is the the appropriate element in the variance-covariance matrix (see COV).
If ordering is present, the statistic $Z$ will be greater than the upper $\alpha$-th percentile of a standard normal distribution.

For example:

library(survMisc)
data(larynx, package="KMsurv")
s4 <- survfit(Surv(time, delta) ~ stage, data=larynx)
comp(s4)

This will give tests for trend with various weights (as are used with the standard log-rank test):

$tests$trendTests
                                     Z       p
Log-rank                      3.718959 0.00010
Gehan-Breslow (mod~ Wilcoxon) 4.224765 0.00001
Tarone-Ware                   4.058010 0.00002
Peto-Peto                     4.129343 0.00002
Mod~ Peto-Peto (Andersen)     4.136319 0.00002
Trend F-H with p=1, q=1       2.396992 0.00827

The package has been changed slightly, and instead of tests or trendtests, use tft, which stands for... tests for trend. Furthermore, when using the comp() function, type it like this: comp(ten(<whatever your survival-fit object is>)).

Best Answer

Related Solutions

Solved – Compare two survival curves for paired data

Solved – Comparison of Kaplan-Meier curves across ordered groups

Related Question