Solved – How to test if an interaction is significant: interaction terms or model comparison

interactionregressionstatistical significance

I ask the question based on a current case, but I would really appreciate a general answer, because it has been bugging me for some time:

I'm running regressions with interaction effects.
How do I test if the interaction is significant?

Option A: I look at the interaction coefficients. If they are significant, the interaction is significant.

Option B: I run two regression models: One with all main effects and one with the main effects and interaction terms. If the explanatory power of the interaction model is significantly higher, I interpret the interaction.
(e.g., comparing the two models with the anova() function in R; running an F test)

Many of my colleagues choose option A, but I seem to recall that my statistics instructor insisted that option B is preferable.

This question has become pertinent, because I have some models where the interaction term is significant, but the explanatory power of the models with and without the interaction is not significantly different.

Best Answer

Option B.

Option A can be inconsistent, especially if there are categorical variables. Just by changing the reference group can drastically change the p-values of each dummy's and each dummy's interaction term in the regression output.

Option B provides an overall test and it'd be same no matter which reference group is selected.


Do you have any insight on the case with "normal" continuous predictors?

For normal continuous predictor the interaction p-value is the same as the F-test p-value, assuming both $x_1$ and $x_2$ are continuous, the no-interaction model is:

$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$$

And the interaction model is:

$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 (x_1 \times x_2)$$

These two correspond to the "reduced" and "full" model for the F-test, and since the extra term is only $\beta_3$, the extra sum of squares in the full model is solely contributed by it; meaning that its own p-value will be the same as the p-value of the F-test.

Related Question