Survival Analysis – Test for PH Assumption vs Visual Inspection of Schoenfeld Residuals

cox-modelproportional-hazardsschoenfeld-residualssurvival

I am inspecting the PH assumption of a Cox model, both by testing it (with cox.zph in R) and visualizing the Schoenfeld residuals. I found many references with cases where the test is significant, but looking at the residuals, it looks there is no violation. This is the case with large sample size.

I have the other way around. I ran a Cox model adjusting for multiple variables with quite a large sample size, 6000 observations, 2000 events. One of them shows no violation of PH according to the test, but looking at the plot, I am wondering if there could be something. On the the left with the data, on the right without. To me, it looks like the variability of beta(t) is quite important at the end, after 25-28 days. So in favor of a time-varying coefficient. because a log-hazard ratio of zero around 11-15 days and 1 at the end is quite substantial.

Does it make sense to have such a behaviour? And no accounting for the result of the test? Scientifcally speaking, it would also make sense to observe this change.

enter image description here

Best Answer

This might be an artifact of the way that plot.cox.zph() generates its smoothed curves. The p-values for cox.zph() do not depend on the displayed smoothed curves, so an apparent discrepancy is possible.

By default, the smoothed curve is a natural spline with 4 degrees of freedom (df = 4). That's a reasonable choice for typical survival studies with several dozen to a few hundred events. It might be too small for a large data set like yours.

Natural splines enforce linearity outside the outer knots. I don't know how plot.cox.zph() chooses the knot locations, but the behavior of the smoothed curve you show suggests that the linear part of the curve at the far right is a linear extrapolation beyond an outer knot that might not well represent the data you have.

Try specifying more knots first (by changing df in the call to plot.cox.zph()), or playing with the knot locations (if that's possible). If you still see similar behavior, then what might be going on is that the number of events at those late time points is too small to overwhelm the wavy behavior around the coefficient estimate from many earlier time points, in the score test performed and reported by cox.zph().

You could then proceed with some handling of time-dependent behavior at late times if you think it's important, but that might then affect the validity of the PH assumption for other predictors.

Related Question