Solved – Treatment of interactions in multiple regression

anovamultiple regression

I'm confused about how interactions are treated in multiple regressions. When doing a factorial ANOVA, I'd just look at the graphs, say they're not parallel so there's an interaction, run my ANOVA with the interaction term, and determine if it's significant.

However, in multiple regression, I add in my interaction term, see it's not significant, and none of my factors are either as my p-values are all screwed, so I then centre all my factors, do the analysis against those and the "centred" interaction term, then discover my interaction isn't significant… But I'm apparently meant to keep this interaction term in the model.

I don't think I fully understand why interactions are treated differently in factorial ANOVA compared to multiple regressions – is there something I'm missing here?

Best Answer

Interactions shouldn't be treated differently in ANOVA and regression. In fact, an ANOVA is a regression analysis with only categorical predictors. I can say a few things here concerning your question. (Note that for a treatment of some of the underlying issues, I gave a fuller response here which may be helpful for understanding the big picture better.)

First, when conducting an ANOVA, if you look at a graph and then decide which terms to enter into the model, this is logically equivalent to an automatic model selection procedure (even though you did it rather than have the software do it for you automatically). What that means in practical terms, is that the p-values you get from your software's output are wrong. For instance, you might believe that p<.05, because that's what the software returns, but actually, it isn't.

If you are concerned about the possibility of an interaction amongst your covariates when you commence your study (that is, before you've ever seen your data), you should include an interaction term in your model. This is true for both ANOVA's and regressions (especially since they're ultimately the same). Whether or not it is 'significant', it should stay in your model (again, for both ANOVA's and regression models).

Centering predictors helps only sometimes, but often has no effect. One case where centering does help is when you have predictors that range only over positive values (for example), and you form a polynomial term (such as $x^2$) to capture a curving relationship. Centering the predictor first, means that the squared term will be sloping downward for the first half of the range, and upward for the rest. This makes it more distinct from the original variable. But often, centering has no effect, so don't be surprised if variables remain 'non-significant' after centering; it's just not what centering does.

Lastly, you can plot your data to examine interactions in a regression setting (i.e., with continuous predictors). One approach is to use a coplot, and look to see if the relationship between one covariate and the response variable changes across levels of the other covariate. (However, remember you should not change what terms you enter into the model based on what you see there.) You can also plot interactions returned by your model. To do so, you select one of the variables to condition on, and plot the relationship between the other variable and the response variable when the conditioning variable is held at it's mean, and $\pm$ 1 SD.

Related Solutions

Solved – Analyse unbalanced repeated measures 2x2x2x2 type II anova interactions

As this is still a very simple design you should stick with the classical analysis, ANOVA.

I recommend (in agreement with e.g., Maxwell & Delaney) to use Type III sums of squares for this problem and inspect the interactions by running separate ANOVAs for the levels for one of the factors involved in the interactions (similar to so called simple main effects analysis). This will essentially tell you what drives the interaction and it is convenient to report (it is generally recommended to plot the interaction before doing so).

Note that you need to use contrast coding before running ANOVAs in R with the 3 sums of squares by calling the following before your ANOVAs:

options(contrasts=c("contr.sum","contr.poly"))

Solved – Interpreting 5-way Mixed Model ANOVA

Generally, you should start from the highest order interactions. You are probably aware that it is usually not sensible to interpret a main effect A when that effect is also involved in an interaction A:B. This is because the interaction tells you that the effect of A actually depends on the level of B, rendering any simple main effect interpretation of A impossible. In the same way, if you have factors A, B, C, then A:B should not be interpreted if A:B:C is significant.

Thus, when you have a 5-way interaction, none of the lower-order interactions can be sensibly interpreted. Therefore, if I understand you correctly and you have interpreted your lower order interactions, you should probably not continue along those lines.

Rather, what you can do is to split up your data set and continue to analyze factor levels of your data set separately. Which of the factors you use to split up the dataset is arbitrary, but often it is very useful to split up the data for each variable and assess what you see. In your example, you might start with sex, and calculate an ANOVA for males, and another one for females (each ANOVA contains the 4 remaining factors). Just as well, you could split up the data according to ethnicity (one ANOVA for Asian, one for Caucasian). You could also split up by one of the within-subject factors.

I will assume that you have decided to split the data by sex (just to continue with the example here). Then, assume that for males, you get a 4-way interaction. You would then go on to split up the male data by one of the remaining variables (say, ethnicity). You would then calculate ANOVAs for male Asians (over the remaining 3 factors), and for male Caucasians.

Importantly, if you get only a lower-order interaction, then you are only "allowed" to analyze these further. This is because the other factors did not show significant differences. Thus, if your males ANOVA gives you only a 2-way interaction, then you would average over the other factors and calculate only an ANOVA over the 2 interacting factors (and, because we are in the male part of the ANOVAs, this would be for the males alone).

For the females, everything may look different, and so the decision which follow-up ANOVAs to calculate is separate for this group. So, what you did for males should be done for females in the same way ONLY if you got the same interactions.

Thus, you will potentially have a lot of ANOVAs, and it might not be easy to decide which ones to report. You should report 1 complete line down from the hightest interaction to the last effects (possibly t-tests to compare only 1 of your factors at the end). You should not usually report several lines (e.g., one starting the split-up by sex, then another one starting by ethnicity). However, you must report a complete line, and cannot simply choose to report only some of the ANOVAs of that line. So, you report one complete analysis, not more, not less. Which way to go in terms of splitting up / follow-up ANOVA is a subjective decision (unless you have clear hypotheses you can follow), and might depend on which results can be understood best etc.

Best Answer

Related Solutions

Solved – Analyse unbalanced repeated measures 2x2x2x2 type II anova interactions

Solved – Interpreting 5-way Mixed Model ANOVA

Related Question