Solved – Log transformation for percentages or proportion in regression

regression

I am doing a study understanding impact of vehicle registration fee on vehicle miles travelled. I am using linear regression and figured out that log transformation of dependent variable and some of the control variables improves the R2 and reduces standard error. I have a few control variables that are in percentages / proportions. I checked them for normality and their are normally distributed, but I still have to log transform them as R2 improves by it as compared to when they are not log transformed. Can someobody explain me why this happens?

Best Answer

There's no requirement for the control variables to have any particular distribution. Indeed, the marginal distribution of the IVs or even the DV is not even an issue -- you're checking something that is not related to any regression assumption whatever. (Many posts on site discuss regression assumptions in detail.)

There's also no reason to expect that a transformation of an IV that makes the relationship more linear would not make the distributional shape of an IV less normal. Since the distributional shape of an IV is not of any consequence doing the first at the expense of the second should not worry you in and of itself.

However, if you're trying transformations and choosing one with higher $R^2$ and then using the same data for hypothesis tests or confidence intervals or prediction intervals among other things (i.e. using the same data on which you chose a model to evaluate the model or predict from it) then the statistical procedures don't have the properties they're intended to -- among other things, p-values are too low, standard errors are too small.