I want to mean center my interaction terms in a regression model (i.e., make the mean zero for each variable). I understand that I am supposed to mean center my variables first and then multiply them together to create my interaction term. But is it a problem that when I multiply two negative scores, I will have a positive score? I haven't been able to find a good answer to this. Thank you!
Solved – Mean centering interaction terms
centeringinteractionmeanmulticollinearityregression
Related Solutions
When you discuss the interaction you don't discuss it in terms of what happens to y but what happens to a1 and a2. If it's positive then as a1 increases a2 increases. If it's negative then as a1 increases a2 decreases. Interactions are about differences in effects. The slopes in regression are the effects and so it's most efficiently discussed in those terms.
Once you consider the foregoing you can understand why many say that main effects don't mean anything when there is an interaction. I don't agree with that as a generalization but certainly there are many cases where it's true.
The p-value of the two versions of cmpg
will be the same, and whether it's pre- or post-regression centering is only a matter of your choice. Really, you don't need any poll to make a decision, as long as you explain it clearly in the Methods section you're all set.
Practically, I would slightly favor centering the variables with cases that will be in the model (aka after list-wise deletion.) The reason is that it's a lot more natural to read:
Cases with missing values were excluded in this analysis. Continuous independent variables were then centered at mean before the regression analysis.
than:
Continuous independent variables were centered at mean. Cases with missing values were excluded from the analysis.
Both will give the same slope and p-value for cmpg
, but the second one is more likely to cause confusion for people who understand enough about this technique but not enough to realize the two methods are nearly the same.
However, given the missing structure in my data I hardly know if the sample is in any way representative.
Not knowing the nature of missing is actually a much bigger issue here although it's not the focus of the question. Lacking this knowledge or even assumption can undermine possible understanding of potentially very large biases.
Best Answer
You don't have to mean-center variables that are included in interaction terms. Back in the dark ages when people did statistical calculations by hand on mechanical (not electronic) calculators having limited precision, there might have been some practical advantages to centering first. But modern computing hardware and software make that unnecessary. Frank Harrell has commented here: "I almost never use centering, finding it completely unncessary and confusing."
But if you do center, you will still get the correct results because of your observation that "when I multiply two negative scores, I will have a positive score."
Say that all regression coefficients (including for interactions) and the variables in their original scales are positive. Then a two-way interaction term adds a more positive contribution to the final prediction than either of the variables would contribute individually.
Now say that you center the data, and you have a situation where both predictor variables have values below their means. You still want that two-way interaction to add a more positive contribution to the final prediction than either of the variables would contribute individually. So their "positive score" in the interaction is just what you want. The difference is that, after centering, the individual contributions of both predictors will have been negative relative to the (new) intercept of the mean-centered model.
Between centering and not, the intercept and coefficients for variables involved in interactions with centered variables will change. The coefficient for a centered predictor will not change, however, unless it is involved in an interaction with another centered variable.
To see this, consider the following linear model for $y$ using predictor $x$ centered around its mean value $\bar x$ and uncentered $z$:
$$y = \beta_0 +\beta_1(x-\bar x)+\beta_2z+\beta_3(x-\bar x)z$$
Collecting together terms that are constant, those that change only with $x$, those that change only with $z$, and those involving the interaction, we get:
$$y = (\beta_0 - \beta_1\bar x)+\beta_1 x+ (\beta_2 - \beta_3\bar x)z+\beta_3xz$$
Compare that against the corresponding model with neither $x$ nor $z$ centered:
$$y=\beta_0' + \beta_1'x+\beta_2'z +\beta_3' xz$$
So centering $x$ changes the intercept and the coefficient for $z$ from the uncentered model, but leaves the coefficients for $x$ and for the $xz$ interaction unchanged.
The reported p-values for the coefficient for $z$ will differ between the uncentered and $x$-centered models. That might seem troubling at first, but that's OK. The correct test for significance of a predictor involved in an interaction must involve both its individual coefficient and its interaction coefficient, and the result of that test is unchanged by centering.