I'm wondering about the Wald test when applied to regression coefficients in the Cox PH model.
In linear regression, you have to estimate $\sigma^2$ separately from the mean, which means the standard error for the coefficients is based on an estimate, leading to using t-scores to test the coefficients rather than Z-scores (i.e. the standard error for $\hat{\beta}$ is $\sqrt{s^2(X^TX)^{-1}_{jj}}$ instead of $\sqrt{\sigma^2(X^TX)^{-1}_{jj}}$).
In the Cox case, Z-scores are shown in the R output, because Wald tests are done on the coefficients. The Wald test assumes the coefficient is normally distributed: $$\frac{\hat{\beta}}{se(\hat{\beta})}\sim N(0,1)$$
but why doesn't it follow a t-distribution, since the standard error is estimated? I realize the standard errors are not computed the same way as in linear regression (although I'm not 100% clear on that), but it just seems like a t-distribution should be used. I must be missing a property of the Wald test. On Wikipedia it says "The square root of the single-restriction Wald statistic can be understood as a (pseudo) t-ratio that is, however, not actually t-distributed except for the special case of linear regression with normally distributed errors. In general, it follows an asymptotic z distribution", but I don't really understand what that means. Any help is appreciated!
Best Answer
Asymptotic theory is an important basis for Cox models and other types of models fit by maximum (partial) likelihood. The tests are based on the behavior of statistics as the sample size becomes increasingly large. In that limit of very large sample size, the normal distribution of coefficient estimates holds. At finite sample sizes there's no assurance that a t distribution would hold, however, unlike the situation with sampling from a normal distribution. So the tests are based on the asymptotic normality.
With small sample sizes, likelihood ratio tests are typically more reliable than Wald tests, but they require refitting the model over a range of coefficient values. A way to proceed is outlined in this answer. I'm not sure whether there is any built-in way to do this for Cox models in R, but I recall that SAS can do this directly.
This page discusses related matters in the context of logistic regression, which fits models similarly.