Solved – How to simulate survival times using true base line hazard function

rsimulationsurvival

I want to simulate survival times from the model

$\lambda(t|X_1,X_2) = \lambda_0(t) \exp(\gamma_1X_1+\gamma_2X_2)$

where covariates are given by $X_1\sim$Binomial(1,0.5), $X_2\sim$Uniform(0,1), the true parameter values are

$\gamma_1=-0.5$, $\gamma_2=1.5$ and the true baseline hazard function is as follows

$ \lambda_0(t)= \left\{
\begin{array}{ll}
\exp(-0.3t); & 0<t\leq 1 \\
\exp(-0.3); & 1<t\leq2.5 \\
\exp(0.3(t-3.5)); & t>2.5
\end{array}
\right. $

The censoring time is from the exponential distribution with mean 2.5.

For a simple baseline hazard function(e.g. $\lambda_0(t)=\lambda$), i used the following algorithm to generate survival times

$$T=H_0^-\{-\log(u) \times \exp(\gamma_1X_1+\gamma_2X_2)\}
=\lambda^-\{-\log(u) \times \exp(\gamma_1X_1+\gamma_2X_2))\}$$

where $u\sim$Uniform(0,1). The R codes are as follows

gamma <- c(-0.5, 1.5)
X1 <- rbinom(n=200,size=1,prob=.5)
X2 <- round(runif(n=200,0,1),2)
time <- (1/.3)*(-log(runif(200))/exp(gamma[1]*X1+gamma[2]*X2))
cen <- rexp(200,rate=1/2.5)
event <- as.numeric(time <= cen)

But i'm struggling to generate for the piecewise baseline hazard function that i mentioned above. Would anyone please help me to generate survival times for the above settings using R.

Best Answer

First: you can sample directly from any survival function, $S(t)$ which shows the time-dependent probability of living to that time or longer. The way to do this is by generating uniform RVs $u$ as quantiles and finding $S^{-1}(u)$. This can be done analytically, or below I have an example of how to do it numerically with a pseudocontinuous or discrete time using colSums(outer(x,y,'<')) which beats quantile by many flops.

Second: the survival function is related to the hazard via: $S(t) = exp(-\Lambda(t))$ where $\Lambda(t) = \int_{0}^t \lambda(s) ds$ is called the cumulative hazard function.

So for simplicity let's sample just from the baseline hazard function, omitting any influence of covariates. As a note, the influence of covariates can be added back by generating survival curves for each individual in the sample by multiplying the hazard function by their exponentiated linear predictor. The cumulative hazard could be found analytically, but a numerical approach with a range of possible failure times is given by:

tdom <- seq(0, 5, by=0.01)
haz <- rep(0, length(tdom))
haz[tdom <= 1] <- exp(-0.3*tdom[tdom <= 1])
haz[tdom > 1 & tdom <= 2.5] <- exp(-0.3)
haz[tdom > 2.5] <- exp(0.3*(tdom[tdom > 2.5] - 3.5))
cumhaz <- cumsum(haz*0.01)
Surv <- exp(-cumhaz)
par(mfrow=c(3,1))
plot(tdom, haz, type='l', xlab='Time domain', ylab='Hazard')
plot(tdom, cumhaz, type='l', xlab='Time domain', ylab='Cumulative hazard')
plot(tdom, Surv, type='l', xlab='Time domain', ylab='Survival)

# generate 100 random samples:
u <- runif(100)
failtimes <- tdom[colSums(outer(Surv, u, `>`))]

dev.off()
library(survival)
plot(survfit(Surv(failtimes)~1))

Gives:

enter image description here enter image description here

Related Question