[Math] Prove $SST=SSE+SSR$

regressionstatistics

Prove $$SST=SSE+SSR$$

I start with $$SST= \Sigma (y_i-\bar{y})^2=…=SSE+SSR+ \Sigma 2( y_i-y_i^*)(y_i^*-\bar{y} )$$ and I don't know how to prove that $\Sigma 2( y_i-y_i^*)(y_i^*-\bar{y} )=0$

a note on notation: the residuals $e_i$ is $e_i=y_i-y_i^*$. A more common notation is $\hat{y}$.

Best Answer

The principle underlying least squares regression is that the sum of the squares of the errors is minimized. We can use calculus to find equations for the parameters $\beta_0$ and $\beta_1$ that minimize the sum of the squared errors.

Let $S = \displaystyle\sum\limits_{i=1}^n \left(e_i \right)^2= \sum \left(y_i - \hat{y_i} \right)^2= \sum \left(y_i - \beta_0 - \beta_1x_i\right)^2$

We want to find $\beta_0$ and $\beta_1$ that minimize the sum, $S$. We start by taking the partial derivative of $S$ with respect to $\beta_0$ and setting it to zero.

$$\frac{\partial{S}}{\partial{\beta_0}} = \sum 2\left(y_i - \beta_0 - \beta_1x_i\right)^1 (-1) = 0$$

notice that this says, $$\begin{align}\sum \left(y_i - \beta_0 - \beta_1x_i\right) &= 0 \\ \sum \left(y_i - \hat{y_i} \right) &= 0 \qquad (eqn. 1)\end{align}$$

Hence, the sum of the residuals is zero (as expected). Rearranging and solving for $\beta_0$ we arrive at, $$\sum \beta_0 = \sum y_i -\beta_1 \sum x_i $$ $$n\beta_0 = \sum y_i -\beta_1 \sum x_i $$ $$\beta_0 = \frac{1}{n}\sum y_i -\beta_1 \frac{1}{n}\sum x_i $$

now taking the partial of $S$ with respect to $\beta_1$ and setting it to zero we have, $$\frac{\partial{S}}{\partial{\beta_1}} = \sum 2\left(y_i - \beta_0 - \beta_1x_i\right)^1 (-x_i) = 0$$

and dividing through by -2 and rearranging we have,

$$\sum x_i \left(y_i - \beta_0 - \beta_1x_i\right) = 0$$ $$\sum x_i \left(y_i - \hat{y_i} \right) = 0$$ but, again we know that $\hat{y_i} = \beta_0 + \beta_1x_i$. Thus, $x_i = \frac{1}{\beta_1}\left( \hat{y_i} - \beta_0 \right) = \frac{1}{\beta_1}\hat{y_i} -\frac{\beta_0}{\beta_1}$. Substituting this into the equation above gives the desired result.

$$\sum x_i \left(y_i - \hat{y_i} \right) = 0 $$ $$\sum \left(\frac{1}{\beta_1}\hat{y_i} - \frac{\beta_0}{\beta_1}\right) \left(y_i - \hat{y_i} \right) = 0$$ $$\frac{1}{\beta_1}\sum \hat{y_i} \left(y_i - \hat{y_i} \right) - \frac{\beta_0}{\beta_1} \sum \left(y_i - \hat{y_i} \right)= 0$$

Now, the second term is zero (by eqn. 1) and so, we arrive immediately at the desired result: $$\sum \hat{y_i} \left(y_i - \hat{y_i} \right) = 0 \qquad (eqn. 2)$$

Now, let's use eqn. 1 and eqn. 2 to show that $\sum \left(\hat{y_i} - \bar{y_i} \right) \left( y_i - \hat{y_i} \right) = 0$ - which was your original question.

$$\sum \left(\hat{y_i} - \bar{y_i} \right) \left( y_i - \hat{y_i} \right) = \sum \hat{y_i} \left( y_i - \hat{y_i} \right) - \bar{y_i} \sum \left( y_i - \hat{y_i} \right) = 0$$

Best Answer

Related Solutions

[Math] Does SSTR (sum of squares for treatments) = SSR (regression sum of squares)

[Math] Linear regression: degrees of freedom of SST, SSR, and RSS

Related Question