Ordinary Least Squares – Proof that OLS Model Coefficients Follow a t-Distribution with (n-k) Degrees of Freedom

least squareslinear modelregressiont-distribution

Background

Suppose we have an Ordinary Least Squares model where we have $k$ coefficients in our regression model,
$$\mathbf{y}=\mathbf{X}\mathbf{\beta} + \mathbf{\epsilon}$$

where $\mathbf{\beta}$ is an $(k\times1)$ vector of coefficients, $\mathbf{X}$ is the design matrix defined by

$$\mathbf{X} = \begin{pmatrix}
1 & x_{11} & x_{12} & \dots & x_{1\;(k-1)} \\
1 & x_{21} & \dots & & \vdots \\
\vdots & & \ddots & & \vdots \\
1 & x_{n1} & \dots & \dots & x_{n\;(k-1)}
\end{pmatrix}$$
and the errors are IID normal,
$$\mathbf{\epsilon} \sim \mathcal{N}\left(\mathbf{0},\sigma^2 \mathbf{I}\right) \;.$$

We minimize the sum-of-squared-errors by setting our estimates for $\mathbf{\beta}$ to be
$$\mathbf{\hat{\beta}}= (\mathbf{X^T X})^{-1}\mathbf{X}^T \mathbf{y}\;. $$

An unbiased estimator of $\sigma^2$ is
$$s^2 = \frac{\left\Vert \mathbf{y}-\mathbf{\hat{y}}\right\Vert ^2}{n-p}$$
where $\mathbf{\hat{y}} \equiv \mathbf{X} \mathbf{\hat{\beta}}$ (ref).

The covariance of $\mathbf{\hat{\beta}}$ is given by
$$\operatorname{Cov}\left(\mathbf{\hat{\beta}}\right) = \sigma^2 \mathbf{C}$$
where $\mathbf{C}\equiv(\mathbf{X}^T\mathbf{X})^{-1}$ (ref) .

Question

How can I prove that for $\hat\beta_i$,
$$\frac{\hat{\beta}_i – \beta_i} {s_{\hat{\beta}_i}} \sim t_{n-k}$$
where $t_{n-k}$ is a t-distribution with $(n-k)$ degrees of freedom, and the standard error of $\hat{\beta}_i$ is estimated by $s_{\hat{\beta}_i} = s\sqrt{c_{ii}}$.


My attempts

I know that for $n$ random variables sampled from $x\sim\mathcal{N}\left(\mu, \sigma^2\right)$, you can show that
$$\frac{\bar{x}-\mu}{s/\sqrt{n}} \sim t_{n-1} $$
by rewriting the LHS as
$$\frac{ \left(\frac{\bar x – \mu}{\sigma/\sqrt{n}}\right) }
{\sqrt{s^2/\sigma^2}}$$
and realizing that the numertor is a standard normal distribution, and the denominator is square root of a Chi-square distribution with df=(n-1) and divided by (n-1) (ref). And therefore it follows a t-distribution with df=(n-1) (ref).

I was unable to extend this proof to my question…

Any ideas? I'm aware of this question, but they don't explicitly prove it, they just give a rule of thumb, saying "each predictor costs you a degree of freedom".

Best Answer

Since $$\begin{align*} \hat\beta &= (X^TX)^{-1}X^TY \\ &= (X^TX)^{-1}X^T(X\beta + \varepsilon) \\ &= \beta + (X^TX)^{-1}X^T\varepsilon \end{align*}$$ we know that $$\hat\beta-\beta \sim \mathcal{N}(0,\sigma^2 (X^TX)^{-1})$$ and thus we know that for each component $k$ of $\hat\beta$, $$\hat\beta_k -\beta_k \sim \mathcal{N}(0, \sigma^2 S_{kk})$$ where $S_{kk}$ is the $k^\text{th}$ diagonal element of $(X^TX)^{-1}$. Thus, we know that $$z_k = \frac{\hat\beta_k -\beta_k}{\sqrt{\sigma^2 S_{kk}}} \sim \mathcal{N}(0,1).$$

Take note of the statement of the Theorem for the Distribution of an Idempotent Quadratic Form in a Standard Normal Vector (Theorem B.8 in Greene):

If $x\sim\mathcal{N}(0,I)$ and $A$ is symmetric and idempotent, then $x^TAx$ is distributed $\chi^2_{\nu}$ where $\nu$ is the rank of $A$.

Let $\hat\varepsilon$ denote the regression residual vector and let $$M=I_n - X(X^TX)^{-1}X^T \text{,}$$ which is the residual maker matrix (i.e. $My=\hat\varepsilon$). It's easy to verify that $M$ is symmetric and idempotent.

Let $$s^2 = \frac{\hat\varepsilon^T \hat\varepsilon}{n-p}$$ be an estimator for $\sigma^2$.

We then need to do some linear algebra. Note these three linear algebra properties:

  • The rank of an idempotent matrix is its trace.
  • $\operatorname{Tr}(A_1+A_2) = \operatorname{Tr}(A_1) + \operatorname{Tr}(A_2)$
  • $\operatorname{Tr}(A_1A_2) = \operatorname{Tr}(A_2A_1)$ if $A_1$ is $n_1 \times n_2$ and $A_2$ is $n_2 \times n_1$ (this property is critical for the below to work)

So $$\begin{align*} \operatorname{rank}(M) = \operatorname{Tr}(M) &= \operatorname{Tr}(I_n - X(X^TX)^{-1}X^T) \\ &= \operatorname{Tr}(I_n) - \operatorname{Tr}\left( X(X^TX)^{-1}X^T) \right) \\ &= \operatorname{Tr}(I_n) - \operatorname{Tr}\left( (X^TX)^{-1}X^TX) \right) \\ &= \operatorname{Tr}(I_n) - \operatorname{Tr}(I_p) \\ &=n-p \end{align*}$$

Then $$\begin{align*} V = \frac{(n-p)s^2}{\sigma^2} = \frac{\hat\varepsilon^T\hat\varepsilon}{\sigma^2} = \left(\frac{\varepsilon}{\sigma}\right)^T M \left(\frac{\varepsilon}{\sigma}\right). \end{align*}$$

Applying the Theorem for the Distribution of an Idempotent Quadratic Form in a Standard Normal Vector (stated above), we know that $V \sim \chi^2_{n-p}$.

Since you assumed that $\varepsilon$ is normally distributed, then $\hat\beta$ is independent of $\hat\varepsilon$, and since $s^2$ is a function of $\hat\varepsilon$, then $s^2$ is also independent of $\hat\beta$. Thus, $z_k$ and $V$ are independent of each other.

Then, $$\begin{align*} t_k = \frac{z_k}{\sqrt{V/(n-p)}} \end{align*}$$ is the ratio of a standard Normal distribution with the square root of a Chi-squared distribution with the same degrees of freedom (i.e. $n-p$), which is a characterization of the $t$ distribution. Therefore, the statistic $t_k$ has a $t$ distribution with $n-p$ degrees of freedom.

It can then be algebraically manipulated into a more familiar form.

$$\begin{align*} t_k &= \frac{\frac{\hat\beta_k -\beta_k}{\sqrt{\sigma^2 S_{kk}}}}{\sqrt{\frac{(n-p)s^2}{\sigma^2}/(n-p)}} \\ &= \frac{\frac{\hat\beta_k -\beta_k}{\sqrt{S_{kk}}}}{\sqrt{s^2}} = \frac{\hat\beta_k -\beta_k}{\sqrt{s^2 S_{kk}}} \\ &= \frac{\hat\beta_k -\beta_k}{\operatorname{se}\left(\hat\beta_k \right)} \end{align*}$$