Regression – Why Higher R-squared After Log Transformation

data transformationlogarithmregression

lately I lost access to SPSS and instead of using Python or R, I tend to perform analysis using a free software called Jamovi.

The thing is, this software doesn't have the different non-linear regression models I liked to use in SPSS, so I tend to make only linear-regression models.

Because of this I became used to perform log transforms in the data prior to making the model. I noticed that most of the time, after transforming a predictor in the linear regression model, the R squared increased.

I know that log transform can reduce the skewness of the data, but I wonder why it always seems to improve the linear-regression model (only using the log transformed variable as a predictor, not the original variable).

Is this a true improvement or is it some kind of statistical artifact? Why does this happen? Should I use it every time?

Best Answer

There's no reason why this would always improve your $R^2$, most likely it is because you are forcing a linear fit where it isn't ideal and your transformation 'linearizes' the predictor somewhat.

Here's a quick counterexample where the data generating mechanism is in fact linear:

set.seed(1)
x <- runif(1e2)
y <- x + rnorm(1e2, 0, 0.1)

summary(fit <- lm(y ~ x))$r.squared           # 0.897
summary(lfit <- lm(y ~ log(x)))$r.squared     # 0.736

The first 100,000 seeds all show a better $R^2$ for the untransformed predictor.

Related Solutions

Distributions – When and Why Should You Take the Log of a Distribution of Numbers?

If you assume a model form that is non-linear but can be transformed to a linear model such as $\log Y = \beta_0 + \beta_1t$ then one would be justified in taking logarithms of $Y$ to meet the specified model form. In general whether or not you have causal series , the only time you would be justified or correct in taking the Log of $Y$ is when it can be proven that the Variance of $Y$ is proportional to the Expected Value of $Y^2$ . I don't remember the original source for the following but it nicely summarizes the role of power transformations. It is important to note that the distributional assumptions are always about the error process not the observed Y, thus it is a definite "no-no" to analyze the original series for an appropriate transformation unless the series is defined by a simple constant.

Unwarranted or incorrect transformations including differences should be studiously avoided as they are often an ill-fashioned /ill-conceived attempt to deal with unidentified anomalies/level shifts/time trends or changes in parameters or changes in error variance. A classic example of this is discussed starting at slide 60 here http://www.autobox.com/cms/index.php/afs-university/intro-to-forecasting/doc_download/53-capabilities-presentation where three pulse anomalies (untreated) led to an unwarranted log transformation by early researchers. Unfortunately some of our current researchers are still making the same mistake.

Several common used variance-stabilizing transformations

Relationship of $\sigma^2$ to $E(y)$	Transformation
$\sigma^2 \propto$ constant	$y'=y$ (no transformation)
$\sigma^2 \propto E(y)$	$y' = \sqrt y$ (square root: Poisson data)
$\sigma^2 \propto E(y)(1-E(y))$	$y' = sin^{-1}(\sqrt y)$ (arcsin; binomial proportions $0\le y_i \le 1$)
$\sigma^2 \propto (E(y))^2$	$y'=log(y)$
$\sigma^2 \propto (E(y))^3$	$y' = y^{-1/2}$ (reciprocal square root)
$\sigma^2 \propto (E(y))^4$	$y' = y^{-1}$ (reciprocal)

The optimal power transformation is found via the Box-Cox Test where

-1. is a reciprocal
-.5 is a recriprocal square root
0.0 is a log transformation
.5 is a square toot transform and
1.0 is no transform.

Note that when you have no predictor/causal/supporting input series, the model is $Y_t=u +a_t$ and that there are no requirements made about the distribution of $Y$ BUT are made about $a_t$, the error process. In this case the distributional requirements about $a_t$ pass directly on to $Y_t$. When you have supporting series such as in a regression or in a Autoregressive–moving-average model with exogenous inputs model (ARMAX model) the distributional assumptions are all about $a_t$ and have nothing whatsoever to do with the distribution of $Y_t$. Thus in the case of ARIMA model or an ARMAX Model one would never assume any transformation on $Y$ before finding the optimal Box-Cox transformation which would then suggest the remedy (transformation) for $Y$. In earlier times some analysts would transform both $Y$ and $X$ in a presumptive way just to be able to reflect upon the percent change in $Y$ as a result in the percent change in $X$ by examining the regression coefficient between $\log Y$ and $\log X$. In summary, transformations are like drugs some are good and some are bad for you! They should only be used when necessary and then with caution.

Solved – The order of Data Centering and Data Transformation

If logarithms of predictors, generically $x$, are helpful, and centring variables on their mean is helpful, would it help to centre before transforming?

Once you have subtracted the mean from a variable, then necessarily at least one value is now negative and logarithms can't (usefully) be calculated (setting aside complex analysis).

Even if you discard the specific suggestion of $\log(x−$ mean of $x)$ on those grounds, the more general idea of transforming $(x−$ mean of $x)$ still

requires a transformation that will work with positive, zero and negative values; there are some (cube root, asinh, ...) but they won't usually help you in any situation in which logarithms are being contemplated seriously
implies that the mean of untransformed data is in some sense a natural or even a convenient origin for the transformed scale, which I think is usually not the case. So it's no go generally for your [1] in my view.

By all means, centre variables, transformed or not, in presenting regression results; it's the same regression and it's a matter of convenience how you explain it. So on your [2] I don't think it changes model interpretation at all; it's just convenience whether you write about centred results.

By the way, there is no "of course" about using $\log(x+1)$ even if $x \ge 0$. That's an ad hoc fudge that some people use, especially it seems in some branches of biology. But there is no standard or accepted logic to it.

Best Answer

Related Solutions

Distributions – When and Why Should You Take the Log of a Distribution of Numbers?

Several common used variance-stabilizing transformations

Solved – The order of Data Centering and Data Transformation

Related Question