Solved – Regression with inverse independent variable

data transformationlinear modelregression

Let's suppose I have a $N$-vector $Y$ of dependent variables, and an $N$-vector $X$ of independent variable. When $Y$ is plotted against $\frac{1}{X}$, I see that there is a linear relationship (upward trend) between the two. Now, this also means that there is a linear downward trend between $Y$ and $X$.

Now, if I run the regression: $Y = \beta * X + \epsilon$
and get the fitted value $\hat{Y} = \hat{\beta}X$

Then I run the regression: $Y = \alpha * \frac{1}{X} + \epsilon$ and get the fitted value $\tilde{Y} = \hat{\alpha} \frac{1}{X}$

Will the two predicted values, $\hat{Y}$and $\tilde{Y}$ be approximately equal?

Best Answer

 When Y is plotted against $\frac{1}{X}$, I see that there is a linear relationship (upward trend) between the two. Now, this also means that there is a linear downward trend between Y and X

The last sentence is wrong: there is a downward trend, but it is by no means linear: Y ~ 1 / X Y ~ X

I used a $f(x) = \frac{1}{x}$ as function plus a bit of noise on $Y$. As you can see, while plotting $Y$ against $\frac{1}{X}$ yields a linear behaviour, $Y$ against $X$ is far from linear.

(@whuber points out that the $Y$ against $\frac{1}{X}$ plot doesn't look homoscedastic. I think it appears to have higher variance for low $Y$ because the much higher case density leads to larger range which is essentially what we perceive. Actually, the data is homoscedastic: I used Y = 1 / X + rnorm (length (X), sd = 0.1) to generate the data, so no dependency on the size of $X$.)

So in general the relationship is very much non-linear. That is, unless your range of $X$ is so narrow that you can approximate $\frac{d \frac{1}{x}}{dx} = - \frac{1}{x^2} \approx const.$ Here's an example:

Y ~ 1 / X Y ~ X

Bottomline:

  • In general, it is very hard to approximate a $\frac{1}{X}$-type function by a linear or polynomial function. And without offset term you'll never get a reasonable approximation.
  • If the $X$ interval is narrow enough to allow a linear approximation, you'll anyways not be able from the data to guess the relation should be $\frac{1}{X}$ and not linear ($X$).
Related Question