[Math] the difference between linear regression on y with x and x with y

mathematicaregressionslopestatistics

I'm plotting the regression line of (GDP$\%$ Change, Poverty Rate$\%$)$\to (x,y)$ in Mathematica

What would it mean if I were to switch the axis? (Poverty Rate $\%$, GDP change $%$)

(GDP$\%$,Poverty$\%$) $\to$ Regression line: $13.555 -0.168842x$

(Poverty Rate $\%$, GDP change $\%$) $\to$ Regression line: $0.275437 -0.109956x$

To put it simply, I'm attempting to understand the difference between linear regression on $y$ with $x$ and $x$ with $y$. Not just the difference in slope but what it actually means.

Thanks!

Best Answer

IMHO, the "actual meaning" is not a mathematical question. I.e., if you understand the technical aspects of the changes in the coefficients, then anything else is just kind of philosophy. Namely, in a classical regression analysis you assume that the "real" underlying model that explains a poverty rate ($Y$) is the GDP ($X$) that is given by $Y = \beta_0 +\beta_1X+\epsilon$. I.e., there is some linear function with a noise term $\epsilon$, where the assumptions on the noise term determines the best estimating procedure of $\beta_0$ and $\beta_1$. In this case $Y$ is called dependent variable, whilst $X$ is independent. So, you can say that you are assuming that the poverty rate depends on the GDP level. Hence, by controlling the GDP you can alter the poverty rate. While in $X = \beta_0 +\beta_1Y+\epsilon$, your reasoning is reversed. I.e., your underlying question is "how poverty rate effects the GDP?". In both ways you are essentially estimating a linear correlation between $X$ and $Y$. The only difference is in the way you post the question and how you interpret the results.