Solved – Does one need to transform percentages/proportions for a multiple linear regression

data transformationgeneralized linear modellogitmultiple regressionregression

I am aware that one should transform percentages and proportions when using them in an ANOVA, due to the values being bounded by 0 and 1. I have seen suggestions that the best transformations are logit and arcsine (with benefits/problems with both).

However, I have two linked questions about a multiple linear regression.

1) Does one still need to transform the percentages and proportions when using them as predictor variables in a multiple linear regression? Or can they be left in their raw form?

2) How about when using percentages/proportions as an outcome variable in a multiple linear regression?

Clarification: As discussed in my original question, I am particularly interested in whether the guidance depends on the percentages/proportions being used as an outcome or predictor variable in a linear regression.

Best Answer

It is less of an issue whether a variable is expressed as a percentage then the underlying distribution of that variable and the residuals of linear regression. In fact, it may be argued that most variables measured are in some way bounded (eg max possible temperature) and discrete. In some cases proportional variables lend themselves to linear regression without transformations and in some cases they can be so clustered or skewed that none of the transformations can mitigate that. Arcsine and logit will work for intermediate cases, particularly when there are a lot of values close to 0 and 1.