Data Visualization – How to Interpret Schoenfield Residual Plot in R

data visualizationrsurvival

I used the veteran database of the package "survival" to do survival analysis. I tried to check the proportional hazard assumption, with with schoenfied residual plot. I always thought if we had a flat line this mean the covariate effect are constant over time, and therfore the proportional hazard assumption were verified. But I see that in these graphs below all lines are flat, although the p.value show the opposite. Finally I wonder how the shoenfield residuals are visually interpreted.

enter image description here

Best Answer

This question illustrates both some weaknesses and a strength of using open-source statistical software. These plots were evidently generated by the ggcoxzph() function in the R survminer package.

The main weakness: a coding problem. Although that function is supposed to be a wrapper around the standard R survival:::plot.cox.zph() function to generate ggplot() output, the y-axis limits are much broader than those seen in standard plots; see corresponding plots for similar models on this data set in Section 3.5.2 of the R survival vignette. Those broad limits are needed to include what seem to be displayed as "confidence bands," but they make it impossible to see details of the smoothed line for scaled Schoenfeld residuals versus time.

The strength: you can find and fix the coding problem. With open-source R, the code for ggcoxzph() and for survival:::plot.cox.zph() is open for inspection. The wide "confidence bands" focus attention on how their display might differ between the 2 functions. In the recent survminer_0.4.9 version of ggcoxzph(), a critical line of code for calculating those bands differs from plot.cox.zph() by a multiplicative factor. The offending line:

seval <- d * ((pmat %*% xtx) * pmat) %*% rep(1, df) 

where d is earlier set to the number of events and d * is not found in the survival:::plot.cox.zph() code. Removing that addition to the code allows plots with reasonable y-axis limits and correct confidence bands.

Fixed ggcoxzph

The next weakness: Even with that fix, it's hard to see details of the shapes of the smoothed curves, as all y axes are compressed to fit plots for all predictors into a single display. That's evidently the default for that function--it's important to question such default settings even if there aren't errors in code. The original plot.cox.zph() produces plots for one predictor at a time, which allows for better visualization. Here's the example for celltype:

Standard cox.zph plot for "celltype" predictor

With the physically larger y axis, details of the curve are much more apparent. Remember that those visual curve details depend on another parameter setting, the transformation of the time axis chosen in the cox.zph() function. The plots shown here are for the default "KM" (Kaplan-Meier) transformation, but there are other options and any function of one argument can be specified.

Code:

library(survival)
library(ggplot2)
library(survminer)
## survival model and cox.zph
vet1 <-coxph(Surv(time,status)~trt+karno+age+celltype+diagtime+prior,data=veteran)
z1 <- cox.zph(vet1)
## remove d * from ggcoxzph() code, save as function ggcoxzphFixed, not shown
gg1 <- ggcoxzphFixed(z1)
gg1 ## provides plots with correct confidence bands and improved y axes
plot(z1[4]) ## standard plot for the celltype predictor
Related Question