Solved – Adding Interaction Terms to Multiple Linear Regression, how to standardize

interactionmultiple regressionregressionstandardization

I am currently running a multiple linear regression, and I am bit confused in regards to how to properly add interaction terms to the model by hand. All of the variables I am using are continuous and have different scales and units.

So far, the way I have done it has been to

  1. Standardize the observations for each variables

  2. Multiply corresponding standardized values from specific variables to create the interaction terms and then add these new variables to the set of regression data

  3. Run the regression

Is this the correct way to go about doing this? Should I standardize the interaction term variables also after calculating the 'raw' terms?

Best Answer

  1. Standardization is not a requirement, but is an option. Mean-centering (a part of standardization) makes the lower order terms more interpretable. Penguin_Knight showed that standarizing after forming the interaction term rather than before gives you the same results as the unstandardized model. Note that this is a consequence of the change in interpretation of lower order terms when you mean-center variables before forming the interaction term. Both of his outputs are valid (note the interaction t value is identical) you just need to know how to interpret the lower order coefficients (the main effects in ANOVA terms). In short, when you mean-center/standardize before forming your interaction terms, the mpg effect is the effect of mpg when for an average weight car (because it is the effect when all other variables it interacts with is 0, and for the weight variables we set 0 to equal the mean). Without mean-centering/standardizing, the mpg effect is the effect of mpg for a car that weighs 0 pounds (hence mean-centering usually improves interpretability since cars can't weight 0 points).

  2. Is correct but missing some details. For continuous variables, you only need to multiply two variables to form an interaction (again after mean-centering or standardizing if you wish). When categorical variables are involved, you can create an interaction term by first creating separate numerical variables that correspond to contrasts of interest. You can create as many contrasts as you have levels of your categorical variable minus 1. You do not need to use a full set of contrast codes, however. Once you have your columns of contrast codes, you create your interactions the same as before as you now are merely multiplying two numerical variables. Note: this works for interactions of categorical:categorical and categorical:continuous and any permutation at higher orders of interactions.

  3. Run your regression. I have been assuming that you also have the lower order variables in the model as well (i.e. $y=a+b+ab$ rather than $y=ab$ which would adjust how you interpret the results).