Regression – Should Quantitative Predictors Be Transformed to Be Normally Distributed?

data transformationglmmnormality-assumptionpredictorregression

I am always struggling with normality testing for quantitative predictors (no factors) and transforming them to normality.

  • If I am running a GLMM and my predictors are really non-normal, should I transform them as well to try to make them normally distributed?
  • I know that this is important for the response variable but what should be done with predictors?

P.S.: I really could not find a similar question.

Best Answer

There is nothing in the theory behind regression models that requires any distribution for X other than having a minimum number of observations in each range of X for which you want to learn something. The only problem you usually run into is overly influential observations due to a heavy right tail of the distribution of X. To deal with that I often fit something like a restricted cubic spline in the cube root or square root of X. In the R rms package this would look like y ~ rcs(x^(1/3)) + ... other variables or rcs(sqrt(x),5) + ... (5=5 knots using default knot placement). That way you only assume a smooth relationship but you limit the influence of large values, while allowing for zeros (though not negative values).

Related Question