Solved – Treatment of interactions in multiple regression

anovamultiple regression

I'm confused about how interactions are treated in multiple regressions. When doing a factorial ANOVA, I'd just look at the graphs, say they're not parallel so there's an interaction, run my ANOVA with the interaction term, and determine if it's significant.

However, in multiple regression, I add in my interaction term, see it's not significant, and none of my factors are either as my p-values are all screwed, so I then centre all my factors, do the analysis against those and the "centred" interaction term, then discover my interaction isn't significant… But I'm apparently meant to keep this interaction term in the model.

I don't think I fully understand why interactions are treated differently in factorial ANOVA compared to multiple regressions – is there something I'm missing here?

Best Answer

Interactions shouldn't be treated differently in ANOVA and regression. In fact, an ANOVA is a regression analysis with only categorical predictors. I can say a few things here concerning your question. (Note that for a treatment of some of the underlying issues, I gave a fuller response here which may be helpful for understanding the big picture better.)

First, when conducting an ANOVA, if you look at a graph and then decide which terms to enter into the model, this is logically equivalent to an automatic model selection procedure (even though you did it rather than have the software do it for you automatically). What that means in practical terms, is that the p-values you get from your software's output are wrong. For instance, you might believe that p<.05, because that's what the software returns, but actually, it isn't.

If you are concerned about the possibility of an interaction amongst your covariates when you commence your study (that is, before you've ever seen your data), you should include an interaction term in your model. This is true for both ANOVA's and regressions (especially since they're ultimately the same). Whether or not it is 'significant', it should stay in your model (again, for both ANOVA's and regression models).

Centering predictors helps only sometimes, but often has no effect. One case where centering does help is when you have predictors that range only over positive values (for example), and you form a polynomial term (such as $x^2$) to capture a curving relationship. Centering the predictor first, means that the squared term will be sloping downward for the first half of the range, and upward for the rest. This makes it more distinct from the original variable. But often, centering has no effect, so don't be surprised if variables remain 'non-significant' after centering; it's just not what centering does.

Lastly, you can plot your data to examine interactions in a regression setting (i.e., with continuous predictors). One approach is to use a coplot, and look to see if the relationship between one covariate and the response variable changes across levels of the other covariate. (However, remember you should not change what terms you enter into the model based on what you see there.) You can also plot interactions returned by your model. To do so, you select one of the variables to condition on, and plot the relationship between the other variable and the response variable when the conditioning variable is held at it's mean, and $\pm$ 1 SD.

Related Question