Cox Regression – Differences Between Residual Types in Survival Analysis

cox-modelreferencesresidualsschoenfeld-residualssurvival

I am fairly new to survival analysis. I was advised to look up and learn Schoenfeld residuals as part of a model diagnosis to see if the proportional hazard assumption has been satisfied. Whilst looking this up I've seen references to many different types of residuals including:

  • Cox-Snell
  • Deviance
  • Martingale
  • Score
  • Schoenfeld

What are the differences between these residuals and when is it recommended to use one over another? (I am happy for answers which are simply links to papers to go read.)

Best Answer

Cox-Snell residuals $r_{Ci}$, are used to assess a model's goodness-of-fit. By plotting the Cox-Snell residual against the cumulative hazard function a model's fit can be assessed. A well fitting model will exhibit a linear line through the origin with a unit gradient. It should be noted that it will take a particularly ill-fitting model for the Cox-Snell residuals to deviate significantly from this. It is also not uncommon to see some slight jumps occurring at the extremities of the graph. One criticism of Cox-Snell residuals is that they do not account for censored observations, therefore the adjusted Cox-Snell residuals were devised by Crowley & Hu (1977) whereby the standard Cox-Snell residual, $r_{Ci}$ could be used for uncensored observations and $r_{Ci} + \Delta$ whereby $\Delta = \log (2) = 0.693$, is used to adjust the residual.

Martingale residuals $r_{Mi}$ can be defined as $r_{Mi} = \delta_i - r_{Ci}$ where $\delta_i$ is a switch taking the value 0 if observation $i$ is censored and 1 if observation $i$ is uncensored. Martingale residuals take a value between $[1, - \infty]$ for uncensored observations and $[0,- \infty]$ for censored observations. Martingale residuals can be used to assess the true functional form of a particular covariate (Thernau et al. (1990)). It is often useful to overlay a LOESS curve over this plot as they can be noisy in plots with lots of observations. Martingale residuals can also be used to assess outliers in the data set whereby the survivor function predicts an event either too early or too late, however, it's often better to use the deviance residual for this.

A deviance residual, $r_{Di} = sgn(r_{Mi})\sqrt{-2 r_{Mi} + \delta_i \log{(\delta_i-r_{Mi})}}$ where the $sgn$ takes a value of 1 for positive martingale residuals and -1 for a negative martingale residual. A residual of high absolute value is indicative of an outlier. A positively valued deviance residual is indicative of an observation whereby the event occurred sooner than predicted; the converse is true for negatively valued residual. Unlike Martingale residuals, deviance residuals are mean centered around 0, making them significantly easier to interpret than Martingale residuals when looking for outliers. One application of deviance residuals is to jackknife the dataset with just one parameter modeled and test for significant difference in parameter coefficients as each observation are removed. A significant change would indicate a highly influential observation.

Schoenfeld residuals are slightly different in that each residual corresponds to a variable, not an observation. The use of Schoenfeld residuals is to test the proportional hazards assumption. Grambsch and Thernau (1994) proposed that scaled Schoenfeld residuals may be more useful. By plotting event time against the Schoenfeld residual for each variable, the variables adherence to the PH assumption can be assessed by fitting a LOESS curve to the plot. A straight line passing through a residual value of 0 with gradient 0 indicates that the variable satisfies the PH assumption and therefore does not depend on time. Schoenfeld residuals can also be assessed through a hypothesis test.

Related Question