lately I lost access to SPSS and instead of using Python or R, I tend to perform analysis using a free software called Jamovi.
The thing is, this software doesn't have the different non-linear regression models I liked to use in SPSS, so I tend to make only linear-regression models.
Because of this I became used to perform log transforms in the data prior to making the model. I noticed that most of the time, after transforming a predictor in the linear regression model, the R squared increased.
I know that log transform can reduce the skewness of the data, but I wonder why it always seems to improve the linear-regression model (only using the log transformed variable as a predictor, not the original variable).
Is this a true improvement or is it some kind of statistical artifact? Why does this happen? Should I use it every time?
Best Answer
There's no reason why this would always improve your $R^2$, most likely it is because you are forcing a linear fit where it isn't ideal and your transformation 'linearizes' the predictor somewhat.
Here's a quick counterexample where the data generating mechanism is in fact linear:
The first 100,000 seeds all show a better $R^2$ for the untransformed predictor.