I have assembled a multivariable Cox proportional hazard model like this:
res.cox <-
coxph(Surv(time, status) ~
AGE +
B +
C,
data = data)
When I use cox.zph(res.cox)
I get a violation for proportional hazard assumptions for the first two variables (and a global violation):
chisq df p
AGE 1.41e+01 1 0.00017
B 2.10e+02 1 < 2e-16
C 2.99e-03 1 0.95638
GLOBAL 2.15e+02 3 < 2e-16
But when I draw graphs of the scaled Schoenfeld residuals against the transformed time using ggcoxzph(cox.zph(res.cox))[1]
, the CI overlaps 0 all the way (first variable for impression, its the same for the others):
It seems to me that this contradicts every issue and article I found:
- How to interpret the schoenfeld residuals plot
- Schoenfeld residuals – Plain English explanation, please!
- http://www.sthda.com/english/wiki/cox-model-assumptions
What am I doing wrong?
BTW, this is the summary for res.cox
:
n= 2521, number of events= 2424
coef exp(coef) se(coef) z Pr(>|z|)
AGE 0.005123 1.005136 0.001574 3.254 0.00114 **
B -0.677403 0.507935 0.039421 -17.184 < 2e-16 ***
B -0.122768 0.884469 0.051108 -2.402 0.01630 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
AGE 1.0051 0.9949 1.0020 1.0082
B 0.5079 1.9688 0.4702 0.5487
C 0.8845 1.1306 0.8002 0.9777
Concordance= 0.648 (se = 0.007 )
Likelihood ratio test= 443.2 on 3 df, p=<2e-16
Wald test = 315.5 on 3 df, p=<2e-16
Score (logrank) test = 304.6 on 3 df, p=<2e-16
UPDATED 14/2/2023:
As @EdM kindly mentioned, the problem was a bug in the 'survminer' package. Luckily, the repo has now been updated, and the PR with the solution was merged.
Example
plot(cox.zph(res.cox), var='AGE')
shows:
and ggcoxzph(cox.zph(res.cox))[1]
shows:
The CRAN package has yet to be updated, you can install the latest version from GitHub as mentioned in the repo:
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/survminer", build_vignettes = FALSE)
library("survminer")
Best Answer
The main thing you are doing wrong is using a software package that might still have a long-standing coding error. That error makes its Schoenfeld residual plots essentially useless. See this page for an explanation and a way to mend the code, if you want to stay with that package. I generally stick with the standard plotting provided by the
survival
package.The other issue is your very large data set with over 2000 events. With that many events you can easily have "statistically significant" violations of proportional hazards that are of little practical significance. The "practical significance" is a judgment call that will be easier to make once you have a useful plot.
Also, an improperly modeled continuous predictor can show proportional hazards violations that are fixed when the predictor is modeled appropriately (e.g., with a regression spline). That could be what's going on with
AGE
, and withB
ifB
is continuous.