Least squares: can’t find why $SSR = a_1 \sum Y_i + b_1 \sum Y_i X_i – n\overline Y^2$. Would anyone have references

least squaresreference-requestregressionstatistics

A past lecture introduced the concept of the sum of the squared differences between the dependent variable's mean and its estimated values, $SSR := \sum \left(\hat Y_i – \overline Y\right)^2$. My lecturer's notes offer the additional following formula for the sum of squares due to regression:

$$SSR = a_1 \sum Y_i + b_1 \sum Y_i X_i – n\overline Y^2$$

where $a_1, b_1$ are the coefficients for the regression line $\hat Y_i = a_1 + b_1 X_i$, and $\overline X, \overline Y$ are the arithmetic means for the values of the data set, $X_i, Y_i$.

My lecturer's notes offer no additional explanations (it's a statistics for economics class), and no matter how much I try to tinker with Gauss' normal equations, I can't figure out how exactly they arrive at this result. Would anyone have a reference I could use, or an explanation of how this equation comes about?

Best Answer

It is often worth it to write these models as a linear expression in lieu of expanding every possible sum. As is usual, I will write variables in lower case and matrices in upper case, otherwise notation will be a mess.

The model is $E(y) = X\beta,$ where $X$ is a matrix of full rank. We want to find the $\beta$ such that $\|y - X\beta\|^2$ is minimum. Calculus yields $$ \hat \beta = (X^\intercal X)^{-1} X^\intercal y $$ In your case, $X = [\mathbf{1}, x]$ where $x$ is your measurements (which you denoted with $X_i$) and $\mathbf{1}$ is a vector of ones. It is easy to see that $$ (X^\intercal X)^{-1} = \left[\begin{matrix} n &n\bar x\\n \bar x & x^\intercal x\end{matrix}\right]^{-1} = \dfrac{1}{n(x^\intercal x - n \bar x^2)} \left[\begin{matrix} x^\intercal x &-n\bar x \\-n \bar x&n\end{matrix}\right] $$ and $$ X^\intercal y = \left[\begin{matrix} n \bar y \\ x^\intercal y\end{matrix}\right], $$ so then $\hat \beta$ is $$ \hat \beta = \dfrac{1}{x^\intercal x - n \bar x^2}\left[\begin{matrix} (x^\intercal x) \bar y - (x^\intercal y) \bar x \\ x^\intercal y - n \bar x \bar y\end{matrix}\right]. $$ If $\hat \beta = (a, b)^\intercal,$ the previous solution reduces to $$ a = \bar y - \bar x b, \quad b = \dfrac{x^\intercal y - n \bar x \bar y}{x^\intercal x - n \bar x} = \dfrac{s_{xy}}{s_{xx}}, $$ where, for given vectors $v$ and $w,$ we write $$ s_{vw} = \dfrac{v^\intercal w - n \bar v \bar w}{n}, $$ which is the most common estimate of the covariance between them.

Now, by definition, the "Total Sums of Squares" is (what you called SSR and) given by $\mathsf{TSS} = \|\hat y - \bar y\mathbf{1}\|^2.$ In your case, this amounts to $$ \|\hat y - \bar y \mathbf{1}\|^2 = \|(a- \bar y) \mathbf{1} +bx\| = (a- \bar y)^2 \mathbf{1}^\intercal \mathbf{1} + 2(a - \bar y)b \mathbf{1}^\intercal x + b^2 x^\intercal x, $$ this simplifies to $$ (\bar x b)^2 n - 2(\bar x b) b n \bar x + b^2 x^\intercal x = b^2 (x^\intercal x - n \bar x^2) = \dfrac{s_{xy}^2}{s_{xx}^2} n s_{xx} = \dfrac{n s_{xy}^2}{s_{xx}}. $$

Now, your expression is $$ a n \bar y + b x^\intercal y - n \bar y^2 = (\bar y-\bar x b) n \bar y + b x^\intercal y - n \bar y^2 = b(x^\intercal y - n \bar x \bar y) = \dfrac{s_{xy}}{s_{xx}} n s_{xy} = \dfrac{n s_{xy}^2}{s_{xx}}. $$ Both expressions coincide. QED

Ammend. Apparently, the total sum of squares is actually $\|y - \bar y \mathbf{1}\|^2$ and what you call the sums of squares due to regression does not appear in my books (e.g. Mardia, Kent and Bibby "Multivariate Analysis"; Seber and Lee "Linear Regression Analysis"; Seber "Multivariate Observations"; Takeuchi, Yanai and Mukherjee "The Foundations of Multivariate Analysis"; Casella and Berger "Statistical Inference"; etc.)