[Math] Show that the least squares line must pass through the center of mass

least squareslinear algebraregression

My problem:

The point $(\bar x, \bar y)$ is the center of mass for the collection of points in Exercise 7. Show that the least squares line must pass through the center of mass. [Hint: Use a change of variables $z = x – \bar x$ to translate the problem so that the new independent variable has mean 0.]

I have already solved Exercise 7:

Given a collection of points $(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)$, let $\mathbf x = (x_1, x_2, \ldots, x_n)^T$, $\mathbf y = (y_1, y_2, \ldots, y_n)^T$, $\bar x = \frac 1n \sum_1^n x_i$, $\bar y = \frac 1n \sum_1^n y_i$ and let $y = c_0 + c_1 y$ be the linear function that gives the best least squares fit to the points. Show that if $\bar x = 0$, then $c_0 = \bar y$ and $c_1 = \frac {\mathbf x^T \mathbf y}{\mathbf x^T \mathbf x}$.

It is obvious that if $x = \bar x$ then $y = c_0 + c_1x = \bar y + 0 = \bar y$, however the hint suggests that the problem should be solved in another way.

Edit
I have found an answer. It makes use of the following theorem:

If A is an m x n matrix of rank n, the normal equations $ A^T A \mathbf x = A^T \mathbf b$ have a unique solution $ \hat {\mathbf x} = (A^TA)^{-1}A^T \mathbf b$ and $ \hat {\mathbf x} $ is the unique least squares solution of the system $ A \mathbf x = \mathbf b $.

Now let $ \hat {\mathbf x} = \mathbf c = (c_0, c_1)^T, A = \begin{pmatrix}1 & \cdots & 1 \\x_1 & \cdots & x_n \\\end{pmatrix}, \mathbf b = \mathbf y = (y_1, \ldots, y_n)^T $ such that $c = (A^TA)^{-1}A^Ty$, then
$$\begin{pmatrix}c_0\\c_1\\\end{pmatrix} = \begin{pmatrix}n & \sum x_i\\\sum x_i & \sum x_i^2\\\end{pmatrix}^{-1} \begin{pmatrix}\sum y_i\\\sum x_iy_i\\\end{pmatrix} $$
which gives values for $c_0$ and $c_1$. These values should be used in the formula $c_1x + c_0$, which, together with $ x = \bar x = \frac 1n \sum x_i$, indeed results in $ \bar y $.

Best Answer

Assume we have the linear model $$y=X\beta$$ where \begin{align*} y_{n\times1} =\begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{bmatrix} \hspace{2cm} X_{n\times2} = \begin{bmatrix} 1 & x_1\\ 1& x_2\\ \vdots & \vdots\\ 1 & x_n \end{bmatrix} \hspace{2cm} \beta_{2\times1} = \begin{bmatrix} b_0\\ b_1 \end{bmatrix} \end{align*} and so using linear algebra we have (all of my sums are with respect to $i$ and go to $n$, i.e., $\sum_{i=1}^n$) \begin{align*} \beta &= (X'X)^{-1}X'y\\ &=\left( \begin{bmatrix} 1 & 1&\cdots & 1\\ x_1 & x_2 & \cdots & x_n \end{bmatrix} \begin{bmatrix} 1 & x_1\\ 1& x_2\\ \vdots & \vdots\\ 1 & x_n \end{bmatrix}\right)^{-1} \begin{bmatrix} 1 & 1&\cdots & 1\\ x_1 & x_2 & \cdots & x_n \end{bmatrix} \begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{bmatrix}\\ &=\left(\begin{bmatrix} n&\sum x_i\\ \sum x_i & \sum x_i^2 \end{bmatrix}\right)^{-1}\begin{bmatrix} \sum y_i\\ \sum x_iy_i \end{bmatrix}\\ & \hspace{-1.45in}\text{taking the inverse is not hard since it is a }2\times2 \text{ matrix}\\ &=\frac{1}{n\sum x_i^2-\left(\sum x_i\right)^2}\begin{bmatrix} \sum x_i^2 & -\sum x_i\\ -\sum x_i & n \end{bmatrix}\begin{bmatrix} \sum y_i\\ \sum x_iy_i \end{bmatrix}\\ &=\begin{bmatrix} \frac{\sum x_i^2\sum y_i-\sum x_i\sum x_iy_i}{n\sum x_i^2-\left(\sum x_i\right)^2}\\ \frac{n\sum x_iy_i-\sum x_i\sum y_i}{n\sum x_i^2-\left(\sum x_i\right)^2} \end{bmatrix} \end{align*} and so, \begin{align*} b_0 = \frac{\sum x_i^2\sum y_i-\sum x_i\sum x_iy_i}{n\sum x_i^2-\left(\sum x_i\right)^2} \end{align*} and \begin{align*} b_1 = \frac{n\sum x_iy_i-\sum x_i\sum y_i}{n\sum x_i^2-\left(\sum x_i\right)^2} \end{align*}

Now we have $b_0$ and $b_1$ and so for any value of $x$ we can figure out the corresponding $y$ value. The cool thing about the least squares line is that it WILL ALWAYS pass through the point that corresponds to the mean of $x$ and the mean of $y$. Why is that true? Plug in $\bar x$ to $y=b_0+b_1x$ and after some algebra it's easy to see.

Let's plug in $\bar x$ for $x$ in $y=b_0+b_1x$, so

\begin{align*} b_0+b_1\bar x &=\frac{\sum x_i^2\sum y_i-\sum x_i\sum x_iy_i}{n\sum x_i^2-\left(\sum x_i\right)^2}+\frac{n\sum x_iy_i-\sum x_i\sum y_i}{n\sum x_i^2-\left(\sum x_i\right)^2}\times\frac{1}{n}\sum x\\ &=\frac{\sum x_i^2\sum y_i-\sum x_i\sum x_iy_i}{n\sum x_i^2-\left(\sum x_i\right)^2}+\frac{n\sum x_iy_i\sum x_i-\sum x_i\sum y_i\sum x_i}{n^2\sum x_i^2-\left(\sum x_i\right)^2}\\ &=\frac{\sum x_i^2\sum y_i-\sum x_i\sum x_iy_i}{n\sum x_i^2-\left(\sum x_i\right)^2}+\frac{\sum x_iy_i\sum x_i}{n\sum x_i^2-\left(\sum x_i\right)^2}-\frac{\left(\sum x_i\right)^2\sum y_i}{n^2\sum x_i^2-\left(\sum x_i\right)^2}\\ &=\frac{\sum x_i^2\sum y_i-\sum x_i\sum x_iy_i+\sum x_iy_i\sum x_i}{n\sum x_i^2-\left(\sum x_i\right)^2}-\frac{\left(\sum x_i\right)^2\sum y_i}{n^2\sum x_i^2-\left(\sum x_i\right)^2}\\ &=\frac{\sum x_i^2\sum y_i}{n\sum x_i^2-\left(\sum x_i\right)^2}-\frac{\left(\sum x_i\right)^2\sum y_i}{n^2\sum x_i^2-\left(\sum x_i\right)^2}\\ &=\frac{\sum x_i^2\sum y_i-\left(\sum x_i\right)^2\sum y_i}{n\sum x_i^2-\left(\sum x_i\right)^2}\\ &=\frac{\sum y_i\left(\sum x_i^2-\left(\sum x_i\right)^2\right)}{n\sum x_i^2-\left(\sum x_i\right)^2}\\ &=\frac{1}{n}\sum y_i\\ &\bar y \end{align*}

Related Question