Solved – Standardize binary variable to create interaction term in regression

categorical datainteractionregressionstandardization

I am currently running a multiple linear regression, and I have a question regarding how to properly construct an interaction term between one binary variable (sex) and continuous variable (age) to the model.

I've been advised to standardize variables before creating a product term and entering it into the regression model. This is done to avoid potential multicollinearity between interaction term and component variables.

However, I am confused if I should follow this advice and standardize my binary variable (sex) before creating a product term with the standardized continuous variable (age)?

Some people suggest that both dummy and continuous variables should be standardized to stand on the same ground while some suggest that there's no need to standardize categorical (dummy) variables.

Can you kindly advise what might be a more appropriate way to handle this? Thank you!

Best Answer

It makes little sense to standardize dummy variables

  1. It cannot be increased by a standard deviation so the regular interpretation for standardized coefficients does not apply

  2. Moreover, the standard interpretation of the dummy variable, showing difference in average level of Y between two categories is lost

Your interaction results could be interpreted as follows for:

Among those who are females (sex dummy 1=female 0=male), 1 standard deviation point increase in age (standardized age, mean=0, std=1) has a positive/negative (significant / insignificant) effect of (exact value of the coefficient of the interaction term) on your dependent variable (Y-variable).

The links below might help

page 5 of this link http://polisci.msu.edu/jacoby/icpsr/regress3/lectures/week2/8.RelImport.pdf

page 9 of this link https://stat.ethz.ch/~maathuis/teaching/stat423/handouts/Chapter7.pdf