Solved – How to interpret a Cox hazard model survival curve

cox-modelrsurvival

How do you interpret a survival curve from cox proportional hazard model?

In this toy example, suppose we have a cox proportional hazard model on age variable in kidney data, and generate the survival curve.

library(survival)
fit <- coxph(Surv(time, status)~age, data=kidney)
plot(conf.int="none", survfit(fit))
grid()

For example, at time $200$, which statement is true? or both are wrong?

Statement 1: we will have 20% subjects left (e.g., if we have $1000$ people, by day $200$, we should have approximately $200$ left),
Statement 2: For one given person, he/she has a $20\%$ chance to survive at day $200$.

My attempt: I do not think the two statements are the same (correct me if I am wrong), since we do not have the i.i.d. assumption (survival time for all people is NOT drawing from one distribution independently). It is similar to logistic regression in my question here, each person's hazard rate depends on $\beta^Tx$ for that person.

Best Answer

Since the hazard depends on the covariates, so does the survival function. The model assumes that the hazard function of an individual with covariate vector $x$ is $$ h(t;x) = h_0(t) e^{\beta'x}. $$ Hence, the cumulative hazard of this individual is $$ H(t;x) = \int_0^t h(u;x) du=\int_0^t h_0(u) e^{\beta'x} du = H_0(t)e^{\beta'x}, $$ where we may define $H_0(t)=\int_0^t h_0(u) du$ as the baseline cumulative hazard. The survival function for an individual with covariate vector $x$ is in turn $$ S(t;x) = e^{-H(t;x)}=e^{-H_0 e^{\beta'x}}=S_0(t)^{e^{\beta'x}} $$ where we define $S_0(t) = e^{-H_0(t)}$ as the baseline survival function.

Given estimates $\hat\beta$ and $\hat S_0(t)$ of the regression coefficients and the baseline survival function, an estimate the survival function for an individual with covariate vector $x$ is given by $\hat S(t;x)=\hat S_0(t)^{e^{\hat\beta'x}}$.

To compute this in R you specify the value of your covariates in the newdata argument. For example if you want the survival function for individuals of age=70, do

plot(survfit(fit, newdata=data.frame(age=70)))

If you, as you do, omit the newdata argument, its default value equals the average values of the covariates in the sample (see ?survfit.coxph). So what is shown in your graph is an estimate of $S_0(t)^{e^{\beta'\bar x}}$.

Related Solutions

Solved – Change-point in Cox survival model

You need to make days_ind a time-dependent variable. The way you have it coded right now, everybody whose observation (whether event or censoring time) was after 110 days will have experienced a different hazard throughout their entire followup then those whose observation is before 110 days. What you want to have is for the hazard to "jump" at 110 days.

It is not completely straightforward to set up this analysis in the survival package. You have to split the follow-up period of each person into two periods: up to 110 days and after that. Anybody surviving beyond 110 days would have two observations: one right-censored at 110, and the other left-truncated at 110 and having the actual event on the right side. Fortunately, there is a function to do exactly that: survSplit.

Here is a quick example with a built-in dataset:

> library(survival)
> aml$id <- 1:nrow(aml)  # add a subject ID variable
> aml2 <- survSplit(aml,cut=10,end="time",start="start", event="status", episode="period")
> 
> subset(aml, id<=3)
  time status          x id
1    9      1 Maintained  1
2   13      1 Maintained  2
3   13      0 Maintained  3
> subset(aml2, id<=3)
   time status          x id start period
1     9      1 Maintained  1     0      0
2    10      0 Maintained  2     0      0
3    10      0 Maintained  3     0      0
25   13      1 Maintained  2    10      1
26   13      0 Maintained  3    10      1

You can see that there are now two observations for id's 2 and 3. The period variable corresponds to your days_ind.

From here you can build the model you want, but you have to code the effects carefully, because the effect of period cannot be estimated, since this it refers to different times.

> fit <- coxph(Surv(start, time, status) ~ 
   I((x=="Maintained")&(period==0)) + I((x=="Maintained")&(period==1)), data=aml2)
> fit
Call:
coxph(formula = Surv(start, time, status) ~ I((x == "Maintained") & 
    (period == 0)) + I((x == "Maintained") & (period == 1)), 
    data = aml2)


                                             coef exp(coef) se(coef)     z    p
I((x == "Maintained") & (period == 0))TRUE -1.498     0.224    1.120 -1.34 0.18
I((x == "Maintained") & (period == 1))TRUE -0.722     0.486    0.591 -1.22 0.22

Likelihood ratio test=3.79  on 2 df, p=0.150  n= 41, number of events= 18

Here the two coefficients measure the effect of Maintained vs Non-maintained before 10 days and after 10 days, respectively.

You could also consider using the cmprsk package. It is designed for analysis of competing risks, but there is nothing stopping you from using it for only one outcome. The benefit is that it has an easier way of defining time-dependent covariates (though a really awkward syntax overall):

> library(cmprsk)
> fit1 <- with(aml, crr(time, status, cov1=I(x=="Maintained"), cov2=I(x=="Maintained"), 
+                      tf=function(t)I(t<=10)))
> fit1
convergence:  TRUE 
coefficients:
    I(x == "Maintained")1 I(x == "Maintained")1*tf1 
                  -0.7213                   -0.7387 
standard errors:
[1] 0.5259 1.1500
two-sided p-values:
    I(x == "Maintained")1 I(x == "Maintained")1*tf1 
                     0.17                      0.52

Note that with the different coding, the meaning of the coefficients is not exactly the same as above.

Survival Analysis – How to Create a Toy Survival (Time to Event) Data with Right Censoring

It is not clear to me how you generate your event times (which, in your case, might be $<0$) and event indicators:

time = rnorm(n,10,2) 
S_prob = S(time)
event = ifelse(runif(1)>S_prob,1,0)

So here is a generic method, followed by some R code.

Generating survival times to simulate Cox proportional hazards models

To generate event times from the proportional hazards model, we can use the inverse probability method (Bender et al., 2005): if $V$ is uniform on $(0, 1)$ and if $S(\cdot \,|\, \mathbf{x})$ is the conditional survival function derived from the proportional hazards model, i.e. $$ S(t \,|\, \mathbf{x}) = \exp \left( -H_0(t) \exp(\mathbf{x}^\prime \mathbf{\beta}) \vphantom{\Big(} \right) $$ then it is a fact that the random variable $$ T = S^{-1}(V \,|\, \mathbf{x}) = H_0^{-1} \left( - \frac{\log(V)}{\exp(\mathbf{x}^\prime \mathbf{\beta})} \right) $$ has survival function $S(\cdot \,|\, \mathbf{x})$. This result is known as ``the inverse probability integral transformation''. Therefore, to generate a survival time $T \sim S(\cdot \,|\, \mathbf{x})$ given the covariate vector, it suffices to draw $v$ from $V \sim \mathrm{U}(0, 1)$ and to make the inverse transformation $t = S^{-1}(v \,|\, \mathbf{x})$.

Example [Weibull baseline hazard]

Let $h_0(t) = \lambda \rho t^{\rho - 1}$ with shape $\rho > 0$ and scale $\lambda > 0$. Then $H_0(t) = \lambda t^\rho$ and $H^{-1}_0(t) = (\frac{t}{\lambda})^{\frac{1}{\rho}}$. Following the inverse probability method, a realisation of $T \sim S(\cdot \,|\, \mathbf{x})$ is obtained by computing $$ t = \left( - \frac{\log(v)}{\lambda \exp(\mathbf{x}^\prime \mathbf{\beta})} \right)^{\frac{1}{\rho}} $$ with $v$ a uniform variate on $(0, 1)$. Using results on transformations of random variables, one may notice that $T$ has a conditional Weibull distribution (given $\mathbf{x}$) with shape $\rho$ and scale $\lambda \exp(\mathbf{x}^\prime \mathbf{\beta})$.

R code

The following R function generates a data set with a single binary covariate $x$ (e.g. a treatment indicator). The baseline hazard has a Weibull form. Censoring times are randomly drawn from an exponential distribution.

# baseline hazard: Weibull

# N = sample size    
# lambda = scale parameter in h0()
# rho = shape parameter in h0()
# beta = fixed effect parameter
# rateC = rate parameter of the exponential distribution of C

simulWeib <- function(N, lambda, rho, beta, rateC)
{
  # covariate --> N Bernoulli trials
  x <- sample(x=c(0, 1), size=N, replace=TRUE, prob=c(0.5, 0.5))

  # Weibull latent event times
  v <- runif(n=N)
  Tlat <- (- log(v) / (lambda * exp(x * beta)))^(1 / rho)

  # censoring times
  C <- rexp(n=N, rate=rateC)

  # follow-up times and event indicators
  time <- pmin(Tlat, C)
  status <- as.numeric(Tlat <= C)

  # data set
  data.frame(id=1:N,
             time=time,
             status=status,
             x=x)
}

Test

Here is some quick simulation with $\beta = -0.6$:

set.seed(1234)
betaHat <- rep(NA, 1e3)
for(k in 1:1e3)
{
  dat <- simulWeib(N=100, lambda=0.01, rho=1, beta=-0.6, rateC=0.001)
  fit <- coxph(Surv(time, status) ~ x, data=dat)
  betaHat[k] <- fit$coef
}

> mean(betaHat)
[1] -0.6085473

Best Answer

Related Solutions

Solved – Change-point in Cox survival model

Survival Analysis – How to Create a Toy Survival (Time to Event) Data with Right Censoring

Related Question