Regression – Distribution of Sum of Squares Error

degrees of freedommathematical-statisticsregressionsums-of-squares

I know that distribution of sample variance
$$
\sum\frac{(X_i-\bar{X})^2}{\sigma^2}\sim \chi^2_{(n-1)}
$$
$$
\sum\frac{(X_i-\bar{X})^2}{n-1}\sim \frac{\sigma^2}{n-1}\chi^2_{(n-1)}
$$
It's from the fact that $(X-\bar{X})^2$ can be expressed in matrix form,
$xAx'$ (where A: symmetric), and it could be again be expressed in: $x'QDQ'x$ (where Q: orthonormal, D:diagonal matrix).

What about $\sum(Y_i-\hat{\beta}_0-\hat{\beta}_1X_i)^2$, given the assumption $(Y – \beta_0 – \beta_1X)\sim \mathcal{N}(0, \sigma^2)$ ?

I figure $$\sum\frac{(Y_i-\hat{\beta}_0-\hat{\beta}_1X_i)^2}{\sigma^2}\sim \chi^2_{(n-2)}.$$

But I have no clue how to prove it or show it.

Is it distributed exactly as $\chi^2_{(n-2)}$?

Best Answer

We can prove this for more general case of $p$ variables by using the "hat matrix" and some of its useful properties. These results are usually much harder to state in non matrix terms because of the use of the spectral decomposition.

Now in matrix version of least squares, the hat matrix is $H=X(X^TX)^{-1}X^T$ where $X$ has $n$ rows and $p+1$ columns (column of ones for $\beta_0$). Assume full column rank for convenience - else you could replace $p+1$ by the column rank of $X$ in the following. We can write the fitted values as $\hat{Y}_i=\sum_{j=1}^nH_{ij}Y_j$ or in matrix notation $\hat{Y}=HY$. Using this, we can write the sum of squares as:

$$\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}=\frac{(Y-\hat{Y})^T(Y-\hat{Y})}{\sigma^2}=\frac{(Y-HY)^T(Y-HY)}{\sigma^2}$$ $$=\frac{Y^T(I_n-H)Y}{\sigma^2}$$

Where $I_n$ is an identity matrix of order $n$. The last step follows from the fact that $H$ is an idepotent matrix, as $$H^2=[X(X^TX)^{-1}X^T][X(X^TX)^{-1}X^T]=X(X^TX)^{-1}X^T=H=HH^T=H^TH$$

Now a neat property of idepotent matrices is that all of their eigenvalues must be equal to zero or one. Letting $e$ denote a normalised eigenvector of $H$ with eigenvalue $l$, we can prove this as follows:

$$He=le\implies H(He)=H(le)$$ $$LHS=H^2e=He=le\;\;\; RHS=lHe=l^2e$$ $$\implies le=l^2e\implies l=0\text{ or }1$$

(note that $e$ cannot be zero as it must satisfy $e^Te=1$) Now because $H$ is idepotent, $I_n-H$ also is, because

$$(I_n-H)(I_n-H)=I-IH-HI+H^2=I_n-H$$

We also have the property that the sum of the eigenvalues equals the trace of the matrix, and $$tr(I_n-H)=tr(I_n)-tr(H)=n-tr(X(X^TX)^{-1}X^T)=n-tr((X^TX)^{-1}X^TX)$$ $$=n-tr(I_{p+1})=n-p-1$$

Hence $I-H$ must have $n-p-1$ eigenvalues equal to $1$ and $p+1$ eigenvalues equal to $0$.

Now we can use the spectral decomposition of $I-H=ADA^T$ where $D=\begin{pmatrix}I_{n-p-1} & 0_{[n-p-1]\times[p+1]}\\0_{[p+1]\times [n-p-1]} & 0_{[p+1]\times [p+1]}\end{pmatrix}$ and $A$ is orthogonal (because $I-H$ is symmetric) . A further property which is useful is that $HX=X$. This helps narrow down the $A$ matrix

$$HX=X\implies(I-H)X=0\implies ADA^TX=0\implies DA^TX=0$$ $$\implies (A^TX)_{ij}=0\;\;\;i=1,\dots,n-p-1\;\;\; j=1,\dots,p+1$$

and we get:

$$\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}=\frac{Y^TADA^TY}{\sigma^2}=\frac{\sum_{i=1}^{n-p-1}(A^TY)_i^2}{\sigma^2}$$

Now, under the model we have $Y\sim N(X\beta,\sigma^2I)$ and using standard normal theory we have $A^TY\sim N(A^TX\beta,\sigma^2A^TA)\sim N(A^TX\beta,\sigma^2I)$ showing that the components of $A^TY$ are independent. Now using the useful result, we have that $(A^TY)_i\sim N(0,\sigma^2)$ for $i=1,\dots,n-p-1$. The chi-square distribution with $n-p-1$ degrees of freedom for the sum of squared errors follows immediately.

Related Question