Confusion with linear regression coefficient when variables are reversed

regression

Given $Y$ and $X$ in a typical linear regression model, where

$$Y = \beta_1 X + \epsilon_1 $$

We know that $\hat{\beta_1} = (X^TX)^{-1}X^TY$.

Assuming that the sample mean of $X$ and $Y$ is zero, and denoting the sample s.d. as $\sigma_x$ and $\sigma_y$, and the sample correlation as $\rho_{xy}$, we have:

$$\hat{\beta_1} = \frac{\sigma_y}{\sigma_x} \rho_{xy} $$

If $\sigma_y = \sigma_x$, this simplifies to

$$\hat{\beta_1} = \rho_{xy} $$

Now, if we carry out the same analysis on the following regression problem:

$$ X = \beta_2 Y + \epsilon_2$$

We will also get $$\hat{\beta_2} = \rho_{xy} = \hat{\beta_1}$$

My confusion is the following: Why is $\hat{\beta_2} = \hat{\beta_1} $ and not $\hat{\beta_2} = \hat{\beta_1^{-1}}$? I'm looking at this from a geometric viewpoint (i.e. $X = \frac{1}{m}Y$ if $Y = mX$) and I can't seem to understand why this is (not) the case.

Best Answer

Note that the OLS of $y=\beta x + \epsilon$ is $$ \hat{\beta} = \frac{ \sum x_i y_i }{ \sum x_i^2 } = \frac{<\mathrm{x},\mathrm{y} >}{ \| \mathrm{x} \|^2}, $$ now, take the variance to be $1$ (it can be any other constant), thus $$ \hat{\beta} = <\mathrm{x},\mathrm{y} > = \| \mathrm{x} \| \| \mathrm{x} \| \cos \theta = \cos \theta, $$ if you reverse the order $x = \beta y+\epsilon$ the OLS of $\beta$ remains $ \cos \theta$, i.e., the cosine between the two random vectors. Therefore, you have the same slope.