Solved – Why is the denominator of z-score for linear regression coefficients called standard error

linear modelregressionstandard errorz-score

In machine learning and linear regression applications, the z-value is given by the following formula:

$$z_j = \frac{\beta_j}{\hat{\sigma}\sqrt{v_j}}$$

where

$$\hat{\sigma}^2 = \frac{1}{N-p-1}\sum_{i=1}^{N}(y_i-\hat{y}_i)^2$$

and $\beta_j$ are the regression coefficients and $\hat{\sigma}^2$ is the variance of the coefficients;
$v_j$ are the diagonal elements of $(X^TX)^{-1}$.

The denominator $\hat{\sigma}\sqrt{v_j}$is generally called standard error: why is that? Why is it considered an error and why standard? What does it tell/represent?

Best Answer

  • In OLS, your parameters are normally distributed. So the standard error is essentially the standard deviation of the sampling distribution of each parameter. Therefore, the standard error tells you how far on average sample parameters tend to deviate from the mean population parameter.

  • Then note, that when you divide each parameter by its standard error, you determine how statistically significant each parameter is. You noted the case of z-transformation, routinely applied as part of the Generalised Linear Models in R. The z-transformation assumes that population parameters are known. However, this is often NOT the case.

  • Therefore, in smaller samples especially, it is advisable to use t-transformation instead, as it will yield correct standard errors.

  • Having said that, in large samples, the difference between using z and t statistic for determining statistical significance is negligible, as both tend to converge.

> Related posts you will most definitely find enlightening:

Difference between Standard Error and Standard Deviation

Standard Error in Linear Regression