Solved – How to interpret a Cox hazard model survival curve

cox-modelrsurvival

How do you interpret a survival curve from cox proportional hazard model?

In this toy example, suppose we have a cox proportional hazard model on age variable in kidney data, and generate the survival curve.

library(survival)
fit <- coxph(Surv(time, status)~age, data=kidney)
plot(conf.int="none", survfit(fit))
grid()

enter image description here

For example, at time $200$, which statement is true? or both are wrong?

  • Statement 1: we will have 20% subjects left (e.g., if we have $1000$ people, by day $200$, we should have approximately $200$ left),

  • Statement 2: For one given person, he/she has a $20\%$ chance to survive at day $200$.


My attempt: I do not think the two statements are the same (correct me if I am wrong), since we do not have the i.i.d. assumption (survival time for all people is NOT drawing from one distribution independently). It is similar to logistic regression in my question here, each person's hazard rate depends on $\beta^Tx$ for that person.

Best Answer

Since the hazard depends on the covariates, so does the survival function. The model assumes that the hazard function of an individual with covariate vector $x$ is $$ h(t;x) = h_0(t) e^{\beta'x}. $$ Hence, the cumulative hazard of this individual is $$ H(t;x) = \int_0^t h(u;x) du=\int_0^t h_0(u) e^{\beta'x} du = H_0(t)e^{\beta'x}, $$ where we may define $H_0(t)=\int_0^t h_0(u) du$ as the baseline cumulative hazard. The survival function for an individual with covariate vector $x$ is in turn $$ S(t;x) = e^{-H(t;x)}=e^{-H_0 e^{\beta'x}}=S_0(t)^{e^{\beta'x}} $$ where we define $S_0(t) = e^{-H_0(t)}$ as the baseline survival function.

Given estimates $\hat\beta$ and $\hat S_0(t)$ of the regression coefficients and the baseline survival function, an estimate the survival function for an individual with covariate vector $x$ is given by $\hat S(t;x)=\hat S_0(t)^{e^{\hat\beta'x}}$.

To compute this in R you specify the value of your covariates in the newdata argument. For example if you want the survival function for individuals of age=70, do

plot(survfit(fit, newdata=data.frame(age=70)))

If you, as you do, omit the newdata argument, its default value equals the average values of the covariates in the sample (see ?survfit.coxph). So what is shown in your graph is an estimate of $S_0(t)^{e^{\beta'\bar x}}$.