In least squares estimation, why are the residuals constrained to lie within the space defined by the following equations

least squareslinear algebraregressionstatistics

I've been reading through the Wikipedia article on degrees of freedom (statistics). There is a section about residuals, in relation to least squares estimation. The article says:

Suppose you have some model $Y_i=a+bx_i + \epsilon_i \text{ for } i=1,…,n$.

Let $\hat a$ and $\hat b$ be least squares estimators of $a$ and $b$.

We can compute the residuals as follows: $\hat e_i=y_i-(\hat a + \hat b x_i)$.

The article then says that these residuals are constrained to lie within the space defined by:

$\hat e_1 + \dots + \hat e_n=0$ and $x_1 \hat e_1 + \dots + x_n \hat e_n=0$.

Hence, they say there are $n-2$ degrees of freedom for error.

So, my first question is, where have these two constraints come from?

I guess the first one comes from the fact that the mean of the residuals is supposed to be $0$. The second one, I am not sure about.

I suppose when they say there are $n-2$ degrees of freedom for error, it means the residuals are constrained to lie within an ($n-2$)-dimensional subspace? Hence, my second question is, why do these constraints mean that the residuals are constrained to an ($n-2$)-dimensional subspace?

Best Answer

You find $\hat{a}$ and $\hat{b}$ by looking for minima of the function (in $a$ and $b$):

$$\sum_{i=1}^n e_i^2=\sum_{i=1}^n(y_i-a-bx_i)^2$$

so taking partial derivatives in $a$ and $b$ and making them equal to $0$ yields:

$$0=\left[\frac{\partial}{\partial a}\sum_{i=1}^n(y_i-a-bx_i)^2\right]_{a=\hat{a},b=\hat{b}}=-2\sum_{i=1}^n(y_i-\hat{a}-\hat{b}x_i)=-2\sum_{i=1}^n \hat{e_i}$$

$$0=\left[\frac{\partial}{\partial b}\sum_{i=1}^n(y_i-a-bx_i)^2\right]_{a=\hat{a},b=\hat{b}}=-2\sum_{i=1}^n(y_i-\hat{a}-\hat{b}x_i)x_i=-2\sum_{i=1}^n \hat{e_i}x_i$$

which gives you the desired properties.

Related Question