Solved – Box Cox transformation in R

data transformationrregression

I understand how to use the box cox transformation in R and how to get the graph and lambda.

These are the things that are confusing me. For simplicity assume this example:

 Weight = Gender + Height + Age + Income

gender = categorical variable 1 = male, 0 = female

continuous variables – weight, height, age, income.

I've done:

model = lm(weight ~ gender + height + age + income)

and applied box-cox to this model which is about 1 so no transformation is needed for weight.

The questions are:

  1. How do I apply Box-Cox to the 'x' variables to see if they need to be transformed (I read they can be applied to all x variables, the only disadvantage is its time consuming, but this isn't an issue for me).
  2. How do I know if they need a transformation for sure e.g. if lambda is 0.4 should I use a square root transformation or does it 'have' to be 0.5? what is lambda is 2?
  3. If one or two or all variables need a transformation then how do I adjust the main model formula?

Best Answer

When referring to Box-Cox transformations there are really 2 concepts that look like they are being mixed up. The first is what the original paper was about, the methodology of finding a transformation within a family of transformations that gives the "best" transformation assuming the truth results in normal residuals with equal variance and a linear relationship. This is what you already did with the response (dependent) variable.

In the paper Box and Cox presented a family of transformations (actually the paper has more than 1, but variations on the main one are what people mainly refer to), this family of transformations (or variations on it) is also often called the Box-Cox transformations. If you want to apply one of this family of transformations to a predictor variable, then just plug the variable into the formula.

To determine what types of transformations to apply to predictor variables, the most important thing to use is knowledge about your data and the science behind it. How do you expect weight to change with age, height, or income? (I would be surprised if a Box-Cox transform is the best for any of these).

There are tools like spline fits, ACE, AVAS, and others that can suggest transformations, but you need to use knowledge and common sense to convert these into meaningful transformations.