Solved – How to interpret the output of predict.coxph

cox-modelpredictive-modelsrelative-risk

After fitting a coxmodel it is possible to make predictions and retrieve the relative risk of new data. What I don't understand is how the relative risk is computed for an individual and what is it relative to (i.e. the average of the population)? Any recommendations for resources to help understand (I not very advanced in survival analysis so the simpler the better)?

Best Answer

Edit: the following description applies to survival versions 3.2-8 and below. Starting with version 3.2-9, the default behavior of predict.coxph() changes with respect to treating 0/1 (dummy indicator) variables. See NEWS.

predict.coxph() computes the hazard ratio relative to the sample average for all $p$ predictor variables. Factors are converted to dummy predictors as usual whose average can be calculated. Recall that the Cox PH model is a linear model for the log-hazard $\ln h(t)$:

$$ \ln h(t) = \ln h_{0}(t) + \beta_{1} X_{1} + \dots + \beta_{p} X_{p} = \ln h_{0}(t) + \bf{X} \bf{\beta} $$

Where $h_{0}(t)$ is the unspecified baseline hazard. Equivalently, the hazard $h(t)$ is modeled as $h(t) = h_{0}(t) \cdot e^{\beta_{1} X_{1} + \dots + \beta_{p} X_{p}} = h_{0}(t) \cdot e^{\bf{X} \bf{\beta}}$. The hazard ratio between two persons $i$ and $i'$ with predictor values $\bf{X}_{i}$ and $\bf{X}_{i'}$ is thus independent of the baseline hazard and independent of time $t$:

$$ \frac{h_{i}(t)}{h_{i'}(t)} = \frac{h_{0}(t) \cdot e^{\bf{X}_{i} \bf{\beta}}}{h_{0}(t) \cdot e^{\bf{X}_{i'} \bf{\beta}}} = \frac{e^{\bf{X}_{i} \bf{\beta}}}{e^{\bf{X}_{i'} \bf{\beta}}} $$

For the estimated hazard ratio between persons $i$ and $i'$, we just plug in the coefficient estimates $b_{1}, \ldots, b_{p}$ for the $\beta_{1}, \ldots, \beta_{p}$, giving $e^{\bf{X}_{i} \bf{b}}$ and $e^{\bf{X}_{i'} \bf{b}}$.

As an example in R, I use the data from John Fox' appendix on the Cox-PH model which provides a very nice introductory text. First, we fetch the data and build a simple Cox-PH model for the time-to-arrest of released prisoners (fin: factor - received financial aid with dummy coding "no" -> 0, "yes" -> 1, age: age at the time of release, prio: number of prior convictions):

> URL   <- "https://socialsciences.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
> Rossi <- read.table(URL, header=TRUE)                  # our data
> Rossi[1:3, c("week", "arrest", "fin", "age", "prio")]  # looks like this
  week arrest fin age prio
1   20      1  no  27    3
2   17      1  no  18    8
3   25      1  no  19   13

> library(survival)                                      # for coxph()    
> fitCPH <- coxph(Surv(week, arrest) ~ fin + age + prio, data=Rossi)    # Cox-PH model
> (coefCPH <- coef(fitCPH))                              # estimated coefficients
     finyes         age        prio 
-0.34695446 -0.06710533  0.09689320 

Now we plug in the sample averages for our predictors into the $e^{\bf{X} \bf{b}}$ formula:

meanFin  <- mean(as.numeric(Rossi$fin) - 1)   # average of financial aid dummy
meanAge  <- mean(Rossi$age)                   # average age
meanPrio <- mean(Rossi$prio)                  # average number of prior convictions
rMean <- exp(coefCPH["finyes"]*meanFin        # e^Xb
           + coefCPH["age"]   *meanAge
           + coefCPH["prio"]  *meanPrio)

Now we plug in the predictor values of the first 4 persons into the $e^{\bf{X} \bf{b}}$ formula.

r1234 <- exp(coefCPH["finyes"]*(as.numeric(Rossi[1:4, "fin"])-1)
           + coefCPH["age"]   *Rossi[1:4, "age"]
           + coefCPH["prio"]  *Rossi[1:4, "prio"])

Now calculate the relative risk for the first 4 persons against the sample average and compare to the output from predict.coxph().

> r1234 / rMean
[1] 1.0139038 3.0108488 4.5703176 0.7722002

> relRisk <- predict(fitCPH, Rossi, type="risk")   # relative risk
> relRisk[1:4]
        1         2         3         4 
1.0139038 3.0108488 4.5703176 0.7722002

If you have a stratified model, the comparison in predict.coxph() is against the strata-averages, this can be controlled via the reference option that is explained in the help page.

Related Question