I am trying to caculate survival function in a time dependent covariates Cox model from its baseline hazard function. However, my program gives a different result compared with survfit
. Based on the formula
$S(t)=\exp(-\int_{0}^{t}\lambda_{0}(\mu)\exp[\hat{\beta}Z(\mu)]d\mu$
result should match. I just need this program for illustration but speed. The data I used can be download here and program have been attached. Figure 1 shown the graph generated by the program below.
library(survival)
fit.cox <- coxph( Surv(t1,t2,event) ~ x, data = sim$data)
lambda0 <- basehaz(fit.cox, centered = F) # Estimated Baseline Hazard Function
# Caculate Survival Function
t.cut <- sim$t.cut
lambda0 <- rbind(lambda0, c(0,0) )
x <- sim$x[1, ]
beta <- fit.cox$coefficients
pred <- exp( x * beta )
#Baseline Hazard Function
lambda0.fun <- function(t) {
approx(x = lambda0$time, y = lambda0$hazard, xout = t)$y
}
#Hazard Function
lambda.fun <- function(t){
which.t <- sum(t >= t.cut)
lambda0.fun(t) * pred[which.t]
}
#Survival Function
survival.fun <- function(t){
cum.hazard <- integrate(Vectorize(lambda.fun), 0, t, subdivisions = 1e3L)
exp(- cum.hazard$value)
}
#Test Program
test <-sim$data[1,]
s.est <- Vectorize(survival.fun)(seq(0,2,0.1))
s.est2 <- survfit(fit.cox, newdata = test, id = id,
se.fit = F, type = "efron")
plot(s.est2, xlim = c(0,2))
points(seq(0,2,0.1), s.est)
Best Answer
After looking for the source code of
basehaz.S
, I've got the reason why I am wrong here. Firstbasehaz
simply compute cumulate hazard function $\Lambda_0(t)$ by usingsurvfit
instead of instantaneous hazard function $\lambda_0(t)$Second the main code for
basehaz.S
isThen it's wrong to use this function to estimate a time dependent covariates Cox model (which require
id
option insurvfit
).I think we should avoid to use the
basehaz
function becuase it exists only because Prof. Therneau try to comfort SAS programmers as he described in the document ofbasehaz
function.