Solved – Estimate Survival Function from hazard function — an inconsistent result between basehaz and survfit function

rsurvival

I am trying to caculate survival function in a time dependent covariates Cox model from its baseline hazard function. However, my program gives a different result compared with survfit. Based on the formula

$S(t)=\exp(-\int_{0}^{t}\lambda_{0}(\mu)\exp[\hat{\beta}Z(\mu)]d\mu$

result should match. I just need this program for illustration but speed. The data I used can be download here and program have been attached. Figure 1 shown the graph generated by the program below.

library(survival)
fit.cox  <- coxph( Surv(t1,t2,event) ~ x, data = sim$data)
lambda0 <- basehaz(fit.cox, centered = F)  # Estimated Baseline Hazard Function

# Caculate Survival Function
t.cut   <- sim$t.cut
    lambda0 <- rbind(lambda0, c(0,0) )
    x       <- sim$x[1, ]
beta    <- fit.cox$coefficients
pred    <- exp( x * beta )

#Baseline Hazard Function
lambda0.fun  <- function(t) {
    approx(x = lambda0$time, y = lambda0$hazard, xout = t)$y
}

#Hazard Function
lambda.fun <- function(t){
  which.t <- sum(t >= t.cut) 
  lambda0.fun(t) * pred[which.t]  
}

#Survival Function
survival.fun <- function(t){
  cum.hazard <- integrate(Vectorize(lambda.fun), 0, t, subdivisions = 1e3L)
  exp(- cum.hazard$value)
}

#Test Program
test <-sim$data[1,]
s.est <- Vectorize(survival.fun)(seq(0,2,0.1))
s.est2 <- survfit(fit.cox, newdata = test, id = id, 
              se.fit = F, type = "efron")
plot(s.est2, xlim = c(0,2))
points(seq(0,2,0.1), s.est)

Best Answer

After looking for the source code of basehaz.S, I've got the reason why I am wrong here. First basehaz simply compute cumulate hazard function $\Lambda_0(t)$ by using survfit instead of instantaneous hazard function $\lambda_0(t)$

Second the main code for basehaz.S is

    sfit<-survfit(fit)
    H<- -log(sfit$surv)

Then it's wrong to use this function to estimate a time dependent covariates Cox model (which require id option in survfit).

I think we should avoid to use the basehaz function becuase it exists only because Prof. Therneau try to comfort SAS programmers as he described in the document of basehaz function.

Related Question