Here is a toy survival model with age only. But I still do not know how the surv
numbers in survfit
are calculated: $\exp(\beta x)$ is exp(fit$coefficients*60)=1.135894
then what's the next step? How to get $h_0(t)$? All the resources say $h_0(t)$ is non-parametric, but how it was calculated?
library(survival)
fit<-coxph(Surv(time, status)~age, data=kidney)
pred=summary(survfit(fit,newdata=data.frame(age=60)))
head(cbind(time=pred$time,surv=pred$surv))
time surv
[1,] 2 0.9859540
[2,] 7 0.9562571
[3,] 8 0.9266329
[4,] 9 0.9114153
[5,] 12 0.8810365
[6,] 13 0.8659015
Best Answer
Model
Goal
Obtain an estimate of the survival function over time at 60 years old, $\hat{S}(\cdot \,|\, \text{age} = 60)$.
Solution #1
summary(survfit(fit, newdata = data.frame(age = 60)))
Solution #2
As explained in this post,
basehaz(fit, centered = FALSE)
returns a non-parametric estimate of the cumulative baseline hazard, $\hat{H}_0(t)$:Thus, an estimate of the cumulative hazard at 60 years old can be obtained by $\hat{H}(t) = \hat{H}_0(t) \exp(60 \, \beta_{\text{age}})$:
Finally, an estimate of the survival function is $\hat{S}(t) = \exp(-\hat{H}(t))$:
Remark
Note that the second method returns the estimate of the survival at every time point found in the data set while the first method returns the estimate of the survival only at the event times.