Solved – Linear regression (adding constant to variables)

multiple regressionregression

I'm running a multiple linear regression. Let's suppose I really need to use the logarithmic transformation. However, all values of one variable are negative. I assume I have to do the following: X1 + constant. After that, I can use the logarithmic transformation and run a multiple regression.

I'd like to mention that I have done that before without the logarithmic transformation, running a simple linear regression and it has affected only an alpha coefficient (makes perfect sense for me).

For example, I have got the following results:

  • y = 1,08 + 0,56*x1, original x1
  • y = -0,03 + 0,56*(x1 + 2), x1 + constant

So I can use both equations for making predictions, getting the same results.

Is it still possible to interpret Beta coefficients? I am used to relying on elasticity and logarithmic transformation, showing how independent variables influence "Y". Do I need to take into account that I have added a "constant"? If I do, how?

Best Answer

I would not do this. The problem is that what you choose to add to make x positive is arbitrary and can have a huge effect on the parameter estimates.

First, let's set up x and y and the model:

set.seed(1234)  #Sets a seed

x <- rnorm(100, -10, 1) #Normal mean = -10, sd = 1
y <- 3*x + rnorm(100)

Now, we'll adjust x to be positive so that logs can be taken. Usually, people choose to make the smallest adjusted x close to 0, but how close? Let's try two variations:

xadj1 <- x-min(x) + 0.01
xadj2 <- x-min(x) + 0.1

Now, we fit models:

m1 <- lm(y~log(xadj1))
summary(m1)  #-32.52 + 3.29*log(xadj1)

m2 <- lm(y~log(xadj2))
summary(m2)  #-33.90 + 4.88*log(xadj2)

And the results are quite different.

Related Question