Survival Times – Simulation Through Root Finding

survival

I am simulating survival times from a joint model of longitudinal and survival data,
\begin{equation}
\begin{split}
& Y_i(t) \sim N(\mu_i(t), \sigma_y^2) \\
& \mu_i(t) = \beta_{0i} + \beta_{1i} t + \beta_2 x_{1i} + \beta_3 x_{2i} \\
& \beta_{0i} = \beta_{00} + b_{0i}\\
& \beta_{1i} = \beta_{10} + b_{1i} \\
& (b_{0i}, b_{1i})^T \sim N(0, \Sigma)\\
& h_i(t) = \delta (t^{\delta-1})
\exp (\gamma_0 + \gamma_1 x_{1i} + \gamma_2 x_{2i} + \alpha \mu_i(t)) \\
\end{split}
\end{equation}
I understand that under a constant hazard (exponential) I have an analytic solution by using the inverse-transform principle but I found that I have had to be careful with my choice of coefficients and it will restrict my simulation to an exponential model.

So after my research, I am using package simSurv and I believe uniroot finding is applied underneath. I have to specify an upper bound ( also known as the maximum follow-up time). In this case, survival times exceeding this time is administratively censored.

I cannot find the exact survival time under this method since root-finding depends on the upper bound I specify?
In this case, how can one perform non-informative censoring without knowing the true survival times. My understanding is that, we need to define a censoring distribution. Then do $\min(T_i, C_i)$ for each individual to find the observed survival time but for some individuals we won't know $T_i$, the exact survival time.

Best Answer

As this example model from the simsurv vignette is a Weibull model with proportional hazards, there isn't a problem with simulating exact survival times, provided that the times are less than the upper time limit that you specify. Integrating the instantaneous hazard in the last line of your example over time gives the individual's cumulative hazard as a (continuously increasing) function of time, $H_i(t)$. The upper time limit that you specify just makes sure that the integral will be finite.

The survival function for the individual is $S_i(t) = \exp(-H_i(t))$. An exact survival time is specified by sampling uniformly on (0,1) for a survival probability and then solving that relationship numerically for the corresponding survival time. The upper time limit only comes into play when the sampled value of $S_i(t)$ is so low that the corresponding value of $t$ exceeds the values over which $H_i(t)$ was calculated. Those are the cases "administratively censored" at that upper time limit.

The only individuals for which you won't have exact survival times are those who are "administratively censored" by your choice of the upper time limit. They are treated as having right-censored survival at that upper time limit from the start. You then proceed as you describe to model censoring for all individuals. Some of those "administratively censored" individuals might then end up with even earlier right-censoring times than that upper time limit.

This page provides a bit more detail on a closely related question.

Related Solutions

Solved – Finding median survival time from survival function

Assuming your survival curve is the basic Kaplan-Meier type survival curve, this is a way to obtain the median survival time. From Machin et al. Survival Analysis: A Practical Approach:

If there are no censored observations (...) the median survival time, $M$, is estimated by the middle observation of the ranked survival times $t_{(1)}, t_{(2)},\ldots,t_{(n)}$ if the number of observations, $n$, is odd, and by the average of $t_{(\frac{n}{2})}$ and $t_{(\frac{n}{2}+1)}$ if $n$ is even, that is,
$$ M = \left\{\begin{array}{ll} {t_{(\frac{n + 1}{2})}} & \text{if}\ n\ \text{odd}; \\ \frac{1}{2}\left[{t_{(\frac{n}{2})}} + {t_{(\frac{n}{2} + 1)}}\right] & \text{otherwise}. \end{array}\right. $$
In the presence of censored survival times the median survival is estimated by first calculating the Kaplan-Meier survival curve, then finding the value of $M$ that satisfies the equation $S(M) = 0.5$.

This can either be done, as you suggested, using a graphical technique with your curve, or using the survival function estimates used to construct said curve.

Solved – How to simulate survival times using true base line hazard function

First: you can sample directly from any survival function, $S(t)$ which shows the time-dependent probability of living to that time or longer. The way to do this is by generating uniform RVs $u$ as quantiles and finding $S^{-1}(u)$. This can be done analytically, or below I have an example of how to do it numerically with a pseudocontinuous or discrete time using colSums(outer(x,y,'<')) which beats quantile by many flops.

Second: the survival function is related to the hazard via: $S(t) = exp(-\Lambda(t))$ where $\Lambda(t) = \int_{0}^t \lambda(s) ds$ is called the cumulative hazard function.

So for simplicity let's sample just from the baseline hazard function, omitting any influence of covariates. As a note, the influence of covariates can be added back by generating survival curves for each individual in the sample by multiplying the hazard function by their exponentiated linear predictor. The cumulative hazard could be found analytically, but a numerical approach with a range of possible failure times is given by:

tdom <- seq(0, 5, by=0.01)
haz <- rep(0, length(tdom))
haz[tdom <= 1] <- exp(-0.3*tdom[tdom <= 1])
haz[tdom > 1 & tdom <= 2.5] <- exp(-0.3)
haz[tdom > 2.5] <- exp(0.3*(tdom[tdom > 2.5] - 3.5))
cumhaz <- cumsum(haz*0.01)
Surv <- exp(-cumhaz)
par(mfrow=c(3,1))
plot(tdom, haz, type='l', xlab='Time domain', ylab='Hazard')
plot(tdom, cumhaz, type='l', xlab='Time domain', ylab='Cumulative hazard')
plot(tdom, Surv, type='l', xlab='Time domain', ylab='Survival)

# generate 100 random samples:
u <- runif(100)
failtimes <- tdom[colSums(outer(Surv, u, `>`))]

dev.off()
library(survival)
plot(survfit(Surv(failtimes)~1))

Gives:

Best Answer

Related Solutions

Solved – Finding median survival time from survival function

Solved – How to simulate survival times using true base line hazard function

Related Question