Solved – Back-transformation and interpretation of $\log(X+1)$ estimates in multiple linear regression

data transformationlogarithm

I have performed multiple linear regression analyses with different combinations of transformed and untransformed variables–both explanatory (independent) and response (dependent) variables. All transformations were $\log_{10}(X+1)$ which seem to fit/better fit assumptions of normality. Also, I have indicator (dummy) response variables as explanatory variables. I'm trying to figure out how to interpret the regression estimates, so I would be much obliged if someone could point me toward a good web-based source of information on this, and/or answer the questions below. Thanks in advance.

I am wondering:

When back transforming–do I subtract the constant (1) from the the regression estimates (after raising 10 to power of the estimate), or just when reporting the mean/median for Y? In other words, does adding the constant to the response variable (before log transformation) matter as far as reporting the regression estimates for the explanatory variables?

When do I subtract the constant from the explanatory variable if it is transformed?

Also, for example, after constructing the regression model for a response variable which was log-transformed (x+1), my indicator (explanatory) variable estimate is:

Estimate: 0.008
SE0: 0.007
t: 1.110
P: 0.2660   

with a 95% Confidence interval (in the log10 scale) of -0.0059 TO 0.0213.
I do a back transformation and get: an estimate of 1.017871372 (95% CI from 0.9865 to 1.05). I interpret this as "median Rel Abnd of RESPONSE VARIABLE (which is aprox 0.04) at INDICATOR variable sites is 1.0179 times greater (95% CI = 0.9865 to 1.05) than the median Rel Abund at sites where INDICATOR variable not present, after accounting for other factors".

If anyone could let me know if I'm on the right track, or how to get on the right track, that would be great.

Best Answer

According to Wooldridge 2009 (p. 192), the log(1 + x) transformation may retain the usual interpretation of log(x):

In cases where a variable $y$ is nonnegative but can take on the value 0, $log(1+y)$ is sometimes used. The percentage change interpretations are often closely preserved, except for changes beginning at $y=0$ (where the percentage change is not even defined). Generally, using $log(1+y)$ and then interpreting the estimates as if the variable were $log(y)$ is acceptable when the data contain relatively few zeros.

I suspect this extends to log2 or log10 bases.

See also: