Solved – Why are least-squares parameters normally distributed

least squares

I am trying to figure out why the parameter $$\begin{equation*}
\hat\beta = (X^TX)^{-1}X^TY \end{equation*}$$
is normally distributed in least-squares prediction. (Where Y is a linear function plus normal noise.)
All the examples I've found have said that since
$$\begin{align*}
\hat\beta &= (X^TX)^{-1}X^TY \\
&= (X^TX)^{-1}X^T(X\beta + \varepsilon) \\
&= \beta + (X^TX)^{-1}X^T\varepsilon
\end{align*}$$
we know that
$$\hat\beta-\beta \sim \mathcal{N}(0,\sigma^2 (X^TX)^{-1})$$

I can see how the mean and variance are calculated, but why is this a normal distribution?

Best Answer

In classical statistics the parameter value $\beta$ in a linear regression model is an unknown constant. The value $\hat{\beta}$ is not a parameter - it is an estimator of the parameter, which is a function of the data. The reason this estimator is normally distributed is that it is a linear function of the underlying error vector (as written in the equation you have shown), which is normally distributed under the model assumptions. (Note that even if you relax of the normality assumption, the parameter estimator will still be a summation quantity for which you can invoke the CLT under fairly general conditions; so the distribution of the parameter estimator will converge to the normal distribution under broad conditions even if the model is misspecified.)

Related Question