I haven't been able to gain access to the Simon and Makuch article mentioned above, but having researched the topic I found:
Steven M Snapinn, Qi Jiang & Boris Iglewicz (2005) Illustrating the Impact of a Time-Varying Covariate With an Extended Kaplan-Meier Estimator, The American Statistician, 59:4, 301-307.
That article proposes a time-dependent Kaplan-Meier plot (KM) by simply updating the cohorts at all event times. It also cites the Simon and Makuch article for proposing a similar idea. The regular KM does not allow this, it only allows a fixed division into groups. The proposed method actually splits survival time according to covariate status - just like one could do when estimating a Cox model with piecewise constant covariates. For the Cox model this is a viable idea, and a standard one. It is, however, more intricate when doing a KM plot. Let me illustrate it with a simulation example.
Let's assume that we have no censoring, but some event (e.g., giving birth) that might or might not occur before time of death. Let's also assume constant hazards for the sake of simplicity. We will also assume that giving birth does not alter the hazard of dying. We will now follow the procedure prescribed in the above article. The article clearly states how this is done in R, simply split your subjects on time of giving birth such that they're constant in your grouping variable. Then use the counting process formulation in the Surv
function. In code
library(survival)
library(ggplot2)
n <- 10000
data <- data.frame(id = seq(n),
preg = rexp(n, 1),
death = rexp(n, .5),
enter = 0,
per = NA,
event = 1)
data$exit <- data$death
data0 <- data
data0$exit <- with(data, pmin(preg, death))
data0$per <- 0
data0$event[with(data0, preg < death)] <- 0
data1 <- subset(data, preg < death)
data1$enter <- data1$preg
data1$per <- 1
data <- rbind(data0, data1)
data <- data[order(data$id), ]
Sfit <- survfit(Surv(time = enter, time2 = exit, event = event) ~ per, data = data)
autoplot(Sfit, censSize = 0)$plot
I'm more or less splitting it "by hand". We could use survSplit
as well. The procedure actually gives me a very nice estimate.
We get almost identical estimates for the two groups as we should. But actually, my simulation was perhaps a bit unrealistic. Let's say a woman cannot give birth in the first two time units for some reason. This is at least reasonable in your example: there'll be some time between two pregnancies corresponding to the same woman. Making a small addition to the code
data <- data.frame(id = seq(n),
preg = rexp(n, 1) + 2,
death = rexp(n, .5),
enter = 0,
preg = NA,
event = 1)
we get the following plot:
The same thing would be happening with your data. You will not see any 3rd pregnancies for at least some initial period of time, meaning that your estimate will be 1 for that group and that period of time. This is in my opinion a misrepresentation of your data. Consider my simulation. The hazards are identical, but for every time point the per1
estimate is greater than the per0
estimate.
You could consider different remedies to this problem. You propose pasting them together at some point (let the per1
-curve start from a certain point on the per0
-curve). I like this idea. If I do it on the simulation data, we get:
In our specific case, I think this represents data way better, but I don't know of any published results that support this approach. Heuristically, one can use the argument I presented in another answer:
KM plot with time-varying coefficient
Best Answer
The patients at risk in a survival analysis must start time at risk at entry into the risk set. For a time-dependent covariate, time zero is the time when the at risk unit transfers from the initial to the new status. Thus time at risk in this new state begins at zero and everyone transferring is alive by definition and survival = 1.0 at time zero. The time of transfer in a time-dependent covariate analysis is usually arbitrary. If time at transfer to a new group is fixed, this suggests a possible crossover trial, where crossover time is specified, or some rare biologic condition where timing of group transfer is uniform. Regardless, the best way to graph this new group assignment is to start all units at tome 0 with survival at 1.0.
For example, in the Stanford transplant data, transplant occurs at a random time and it makes most sense to plot two curves beginning at time 0 and survival = 1.0: 1) all patients waiting, 2) those patients transplanted.
Using the landmark of the transplant time is not possible- the times are non-uniform. Shifting to a shared landmark time is possible, if the time is uniform, but the curve shifted to the right seems to imply implausible survival functions at plausible times at risk following transfer to the new at risk group.