Regression of y on x and x on y for SSE=0

regressionstatistics

Question: Suppose ($x_i$, $y_i$), i=$1,…,n$ is a set of pairs of observations. Consider the simple linear regressions of y on and x on y. Show that SSE=$0$ for both models if and only if both regressions produce the same line. SSE denotes the sum of squared residual.

I understand that SSE measures of how much variation in y is left unexplained by the model- that is how much cannot be attributed to a linear relationship.

Also, SSE=$0$, correlation coefficient = $1$.

But how to prove that they have the same line?

Best Answer

If the $SSE$ is zero for both models, then the points are aligned (and are neither vertically or horizontally aligned), therefore the regression line passes through the points, and it's the same line for both models.

For the other implication, introduce the equation of the two regression lines. For the regression of $y$ on $x$, let the regression line be $y=ax+b$. Likewise, for the regression of $x$ on $y$, it's $x=a'y+b'$, but I prefer to write it $y=\frac{1}{a'}x-\frac{b'}{a'}$.

Since both lines pass through the point $(\bar x,\bar y)$, they are identical iff they have the same slope, i.e. if $a=\frac{1}{a'}$, or equivalently if $aa'=1$.

You can check that for the least squares solution, $a=\frac{cov(x,y)}{var(x)}$ and $a'=\frac{cov(x,y)}{var(y)}$.

Hence

$$aa'=\left(\frac{cov(x,y)}{\sigma_X\cdot\sigma_Y}\right)^2$$

That is, $aa'=R^2$. If the regression lines are identical, we have therefore $R^2=1$, hence the SSE is zero.

Related Question