Solved – Log transform produce sign change on coefficients

data transformationlogarithmmathematical-statistics

I want to regress $x_1, x_2, x_3$ and $x_4$ on $y$.

Question: When fitting a regression model in the regular fashion without any transformations the coefficient for $x_4$ is positive. However when I log transform the data (all the variables) the sign of the coefficient for $x_4$ becomes negative.

Any idea why a log transform would change the sign?

Best Answer

At first I thought this must have to do with multicollinearity. But then I tried this out with even a single predictor and you can observe this as well.

The reason is quite simple: it's the noise in the data and the fact that we can't estimate the function (and its derivative) perfectly with finite samples. Moreover, if you believe there is an underlying true (log-)linear relationship than either the log-log model or the original variable model is not linear.

set.seed(1)
XX <- matrix(exp(rnorm(20)), ncol = 1)
yy <- exp(rnorm(nrow(XX)) + 0.1 * XX)

mod.orig <- lm(yy~XX)
mod.log.log <- lm(log(yy)~log(XX))
mod.log.orig <- lm(log(yy)~XX)

layout(matrix(1:3, ncol = 3))
plot(XX, yy)
abline(mod.orig, col = 2)
abline(h = 1, lty = 2, col = 4)

plot(log(XX), log(yy))
abline(mod.log.log, col = 2)
abline(h = 0, lty = 2, col = 4)

plot(XX, log(yy))
abline(mod.log.orig, col = 2)
abline(h = 0, lty = 2, col = 4)

enter image description here

Now in your 4-variable case, I am sure multicollinearity plays a role as well (and correlation of X vs log(X) variables is also very much affected by noise).

Update: I forgot to add the third option before which is the true model in linear terms: $\log y = \alpha + \beta \cdot x$. I added this option as a third option as well.

Related Question