Solved – Mean centering interaction terms

centeringinteractionmeanmulticollinearityregression

I want to mean center my interaction terms in a regression model (i.e., make the mean zero for each variable). I understand that I am supposed to mean center my variables first and then multiply them together to create my interaction term. But is it a problem that when I multiply two negative scores, I will have a positive score? I haven't been able to find a good answer to this. Thank you!

Best Answer

You don't have to mean-center variables that are included in interaction terms. Back in the dark ages when people did statistical calculations by hand on mechanical (not electronic) calculators having limited precision, there might have been some practical advantages to centering first. But modern computing hardware and software make that unnecessary. Frank Harrell has commented here: "I almost never use centering, finding it completely unncessary and confusing."

But if you do center, you will still get the correct results because of your observation that "when I multiply two negative scores, I will have a positive score."

Say that all regression coefficients (including for interactions) and the variables in their original scales are positive. Then a two-way interaction term adds a more positive contribution to the final prediction than either of the variables would contribute individually.

Now say that you center the data, and you have a situation where both predictor variables have values below their means. You still want that two-way interaction to add a more positive contribution to the final prediction than either of the variables would contribute individually. So their "positive score" in the interaction is just what you want. The difference is that, after centering, the individual contributions of both predictors will have been negative relative to the (new) intercept of the mean-centered model.

Between centering and not, the intercept and coefficients for variables involved in interactions with centered variables will change. The coefficient for a centered predictor will not change, however, unless it is involved in an interaction with another centered variable.

To see this, consider the following linear model for $y$ using predictor $x$ centered around its mean value $\bar x$ and uncentered $z$:

$$y = \beta_0 +\beta_1(x-\bar x)+\beta_2z+\beta_3(x-\bar x)z$$

Collecting together terms that are constant, those that change only with $x$, those that change only with $z$, and those involving the interaction, we get:

$$y = (\beta_0 - \beta_1\bar x)+\beta_1 x+ (\beta_2 - \beta_3\bar x)z+\beta_3xz$$

Compare that against the corresponding model with neither $x$ nor $z$ centered:

$$y=\beta_0' + \beta_1'x+\beta_2'z +\beta_3' xz$$

So centering $x$ changes the intercept and the coefficient for $z$ from the uncentered model, but leaves the coefficients for $x$ and for the $xz$ interaction unchanged.

The reported p-values for the coefficient for $z$ will differ between the uncentered and $x$-centered models. That might seem troubling at first, but that's OK. The correct test for significance of a predictor involved in an interaction must involve both its individual coefficient and its interaction coefficient, and the result of that test is unchanged by centering.

Related Question