Why cox.zph tests by sums of coefficient and scaled Schoenfeld residuals

inferencemathematical-statisticssurvival

I am currently reading Dirk Moore's book on Applied Survival Analysis Using R.

The chapter deals with proportional hazard model assumptions. Let $r$ be scaled residual of Schoenfield residual against some variable or corresponding $\beta$.(Assume model looks like $h_1=h_0e^{\beta z}$ for simplicity where $h_i$ are hazard functions.)

The book says "$E[r]=\beta+\beta(t)$" where $\beta$ can be estimated from MLE $\hat{\beta}$ from assumption $h_1=h_0e^{\beta z}$. If I just apply method of moments naively, I should read $r_i=\beta+\beta(t_i)$ where $r_i$ is the $i-$th failure time. Now to test constant beta, one needs to consider test statistics $r-\hat{\beta}$ here. This is exactly reflected in Therneau and Gramsch's 1994 paper section 1. However, it seems that in the book and cox.zph test statistics is $r+\hat{\beta}$

In R code of page 98 of the book, the book uses the following R-code. I do not understand why the book uses $resid.scaled + result.coxph\$coef$ agreeing with $resid.sch\$y$.

> tt <- c(6, 7, 10, 15, 19, 25)
> delta <- c(1, 0, 1, 1, 0, 1)
> trt <- c(0, 0, 1, 0, 1, 1)
> result.coxph <- coxph(Surv(tt, delta) ~ trt)
> result.coxph$coef
> resid.unscaled <- residuals(result.coxph, type="schoenfeld")
> resid.scaled <- resid.unscaled*result.coxph$var*sum(delta)
> resid.scaled + result.coxph$coef     #computation by hand
[1] -2.639193 2.157647 -3.496841 -1.326129
> resid.sch <- cox.zph(result.coxph) #cox.zph testing 
> resid.sch$y    #cox.zph computaion
trt
6 -2.639193
10 2.157647
15 -3.496841
25 -1.326129

What did I miss here? Why cox.zph/book's R code use $r+\hat{\beta}$ for test statistics? Normally one wants a pivotal statistics. $E[r+\hat{\beta}]=2\beta+\beta(t)$ which does not seem to be pivotal even after division by standard error under null $\beta(t)=0$ whereas $E[r-\hat{\beta}]=\beta(t)$.

Best Answer

Why cox.zph/book's R code use $r + \hat \beta$ for test statistics? Normally one wants a pivotal statistics.

I don't have a copy of Moore's book handy, but Therneau and Grambsch explain it this way in Modeling Survival Data: Extending the Cox Model, Section 6.2:

if $\hat \beta$ is the coefficient from an ordinary fit of the Cox model, then $$ E(s_{kj}^*) + \hat \beta_j \approx \beta_j(t_k)$$ where where $s_k^*$ is the scaled Schoenfeld residual... This suggests plotting $s_{kj} + \beta_j$ versus time, or some function of time $g(t)$, as a method for visualizing the nature and extent of nonproportional hazards. A line can be fit to the plot followed by a test for zero slope; a nonzero slope is evidence against proportional hazards.

Here, $k$ is an index of event times and $j$ is an index of predictors. Note that the statistical test they propose is on the slope of that curve, so the offset $\hat \beta_j$ doesn't matter. Also note that they mention "visualizing the nature and extent of nonproportional hazards" before they describe the statistical test.

That plot for each predictor $j$ (with a smoothed loess superimposed) shows the deviations around the overall $\hat \beta_j$ arising from $s_{kj}$. Particularly with large data sets, what matters might be not so much whether there's a "statistically significant" violation of proportional hazards; it could be whether the deviations are large enough to have practical significance. That's best examined visually, in a way that the standard plot provides nicely.

For what it's worth, although I appreciate the value of a pivotal statistic, a search through my electronic copy of Therneau and Grambsch shows that the words "pivot" and "pivotal" never appear there.