Wald Test – Comparison of t- and z-Distribution in OLS and GLMs Regression

generalized linear modelhypothesis testingrregression

I understand that the Wald test for regression coefficients is based on the following property that holds asymptotically (e.g. Wasserman (2006): All of Statistics, pages 153, 214-215):
$$
\frac{(\hat{\beta}-\beta_{0})}{\widehat{\operatorname{se}}(\hat{\beta})}\sim \mathcal{N}(0,1)
$$
Where $\hat{\beta}$ denotes the estimated regression coefficient, $\widehat{\operatorname{se}}(\hat{\beta})$ denotes the standard error of the regression coefficient and $\beta_{0}$ is the value of interest ($\beta_{0}$ is usually 0 to test whether the coefficient is significantly different from 0). So the size $\alpha$ Wald test is: reject $H_{0}$ when $|W|> z_{\alpha/2}$ where
$$
W=\frac{\hat{\beta}}{\widehat{\operatorname{se}}(\hat{\beta})}.
$$

But when you perform a linear regression with lm in R, a $t$-value instead of a $z$-value is used to test if the regression coefficients differ significantly from 0 (with summary.lm). Moreover, the output of glm in R sometimes gives $z$- and sometimes $t$-values as test statistics. Apparently, $z$-values are used when the dispersion parameter is assumed to be known and $t$-values are used when the dispersion parameter is esimated (see this link).

Could someone explain, why a $t$-distribution is sometimes used for a Wald test even though the ratio of the coefficient and its standard error is assumed to be distributed as standard normal?

Edit after the question was answered

This post also provides useful information to the question.

Best Answer

The output from glm using a Poisson distribution gives a $z$-value because with a Poisson distribution, the mean and variance parameter are the same. In the Poisson model, you only have to estimate a single parameter ($\lambda$). In a glm where you have to estimate both a mean and dispersion parameter, you should see the $t$-distribution used.

For a standard linear regression, you assume the error term is normally distributed. Here, the variance parameter has to be estimated - hence the use of the $t$-distribution for the test statistic. If you somehow knew the population variance for the error term, you could use a $z$-test statistic instead.

As you mention in your post, the distribution of the test is asymptotically normal. The $t$-distribution is asymptotically normal, so in a large sample, the difference would be negligible.