I want to simulate survival times from the model
$\lambda(t|X_1,X_2) = \lambda_0(t) \exp(\gamma_1X_1+\gamma_2X_2)$
where covariates are given by $X_1\sim$Binomial(1,0.5), $X_2\sim$Uniform(0,1), the true parameter values are
$\gamma_1=-0.5$, $\gamma_2=1.5$ and the true baseline hazard function is as follows
$ \lambda_0(t)= \left\{
\begin{array}{ll}
\exp(-0.3t); & 0<t\leq 1 \\
\exp(-0.3); & 1<t\leq2.5 \\
\exp(0.3(t-3.5)); & t>2.5
\end{array}
\right. $
The censoring time is from the exponential distribution with mean 2.5.
For a simple baseline hazard function(e.g. $\lambda_0(t)=\lambda$), i used the following algorithm to generate survival times
$$T=H_0^-\{-\log(u) \times \exp(\gamma_1X_1+\gamma_2X_2)\}
=\lambda^-\{-\log(u) \times \exp(\gamma_1X_1+\gamma_2X_2))\}$$
where $u\sim$Uniform(0,1). The R codes are as follows
gamma <- c(-0.5, 1.5)
X1 <- rbinom(n=200,size=1,prob=.5)
X2 <- round(runif(n=200,0,1),2)
time <- (1/.3)*(-log(runif(200))/exp(gamma[1]*X1+gamma[2]*X2))
cen <- rexp(200,rate=1/2.5)
event <- as.numeric(time <= cen)
But i'm struggling to generate for the piecewise baseline hazard function that i mentioned above. Would anyone please help me to generate survival times for the above settings using R.
Best Answer
First: you can sample directly from any survival function, $S(t)$ which shows the time-dependent probability of living to that time or longer. The way to do this is by generating uniform RVs $u$ as quantiles and finding $S^{-1}(u)$. This can be done analytically, or below I have an example of how to do it numerically with a pseudocontinuous or discrete time using
colSums(outer(x,y,'<'))
which beatsquantile
by many flops.Second: the survival function is related to the hazard via: $S(t) = exp(-\Lambda(t))$ where $\Lambda(t) = \int_{0}^t \lambda(s) ds$ is called the cumulative hazard function.
So for simplicity let's sample just from the baseline hazard function, omitting any influence of covariates. As a note, the influence of covariates can be added back by generating survival curves for each individual in the sample by multiplying the hazard function by their exponentiated linear predictor. The cumulative hazard could be found analytically, but a numerical approach with a range of possible failure times is given by:
Gives: