R Survival – How to Manually Calculate `survfit` in Cox Hazard Model

cox-modelproportional-hazardsrsurvival

Here is a toy survival model with age only. But I still do not know how the surv numbers in survfit are calculated: $\exp(\beta x)$ is exp(fit$coefficients*60)=1.135894 then what's the next step? How to get $h_0(t)$? All the resources say $h_0(t)$ is non-parametric, but how it was calculated?

library(survival)
fit<-coxph(Surv(time, status)~age, data=kidney)
pred=summary(survfit(fit,newdata=data.frame(age=60)))
head(cbind(time=pred$time,surv=pred$surv))
     time      surv
[1,]    2 0.9859540
[2,]    7 0.9562571
[3,]    8 0.9266329
[4,]    9 0.9114153
[5,]   12 0.8810365
[6,]   13 0.8659015

Best Answer

Model

library(survival)
fit <- coxph(Surv(time, status) ~ age, data = kidney)

Goal

Obtain an estimate of the survival function over time at 60 years old, $\hat{S}(\cdot \,|\, \text{age} = 60)$.

Solution #1

summary(survfit(fit, newdata = data.frame(age = 60)))

Solution #2

As explained in this post, basehaz(fit, centered = FALSE) returns a non-parametric estimate of the cumulative baseline hazard, $\hat{H}_0(t)$:

H0 <- basehaz(fit, centered = FALSE)

Thus, an estimate of the cumulative hazard at 60 years old can be obtained by $\hat{H}(t) = \hat{H}_0(t) \exp(60 \, \beta_{\text{age}})$:

H <- H0$hazard * exp(60 * fit$coefficients)

Finally, an estimate of the survival function is $\hat{S}(t) = \exp(-\hat{H}(t))$:

S <- exp(- H) 

> data.frame(time = H0$time, Surv = S)
   time        Surv
1     2 0.985953977
2     4 0.985953977
3     5 0.985953977
4     6 0.985953977
5     7 0.956257141

Remark

Note that the second method returns the estimate of the survival at every time point found in the data set while the first method returns the estimate of the survival only at the event times.

Related Question