Solved – Removing interaction term from repeated measures two-way ANOVA in R: Anova() function in car package

anovainteractionrrepeated measures

My question pertains to excluding the interaction term (once it's deemed insignificant) in a two-way repeated measures ANOVA using the Anova() function in the car package. This question is motivated by:

  1. Trying to better understand how the Anova() function works
  2. Curiosity
  3. A desire to be consistent with how I have taught other types of ANOVAs (I tell my students to remove an insignificant interaction term and refit the model to assess main effects)

Note: I understand the Anova() function has a type= option where one may request either the type II or III SS, and thus we could simply run the model with type=2 and assess the main effect p-values, even if the interaction isn't significant. However, for the reasons listed above, I'm still interested to know if there's any way to actually remove the interaction term and fit a main effects-only model.

Data description: The following example is from the UCLA website and is a repeated measures two-way ANOVA with one within-subject and one between-subject factor The data called exer consists of people who were randomly assigned to two different diets: low-fat and not low-fat and three different types of exercise: at rest, walking leisurely and running. Their pulse rate was measured at three different time points during their assigned exercise: at 1 minute, 15 minutes and 30 minutes.

Here, I'm considering only time and diet as predictors (ignoring exercise for simplicity). Note that time is a within-subjects factor and diet and is a between-.

Data to recreate example:

exer <- read.csv("http://www.ats.ucla.edu/stat/data/exer.csv")

# Convert variables to factor
   exer <- within(exer, {diet <- factor(diet)
                         exertype <- factor(exertype)
                         time <- factor(time)
                         id <- factor(id)
                         }
                  )

# Convert data to wide format for sake of Anova() function
  exer_wide <- reshape(exer, 
                       v.names="pulse", # Outcome variable
                       timevar="time", # Repeated measures
                       idvar=c("id", "diet"), # ID variable and non-time-varying predictors
                       direction="wide")

Snapshot of the data at this point:

exer_wide
#    id diet exertype pulse.1 pulse.2 pulse.3
# 1   1    1        1      85      85      88
# 4   2    1        1      90      92      93
# 7   3    1        1      97      97      94
# 10  4    1        1      80      82      83
# 13  5    1        1      91      92      91
# 16  6    2        1      83      83      84
# 19  7    2        1      87      88      90
# 22  8    2        1      92      94      95
# 25  9    2        1      97      99      96
# 28 10    2        1     100      97     100

Fitting the repeated measures two-way ANOVA:

Step 1: Create linear model object (note between-subjects factor on the right-hand side):

exer_lm <- lm(cbind(pulse.1, pulse.2, pulse.3) ~ diet, data=exer_wide)

Step 2: Create time factor:

time_fac <- factor(c("1","2","3"), ordered=F) 

Step 3: Run ANOVA (using type II SS):

library(car)
exer_aov <- Anova(exer_lm, idata=data.frame(time_fac), idesign=~time_fac, type=2)
summary(exer_aov)

# Univariate Type II Repeated-Measures ANOVA Assuming Sphericity

#                   SS num Df Error SS den Df         F    Pr(>F)    
# (Intercept)   894608      1  11227.0     28 2231.1372 < 2.2e-16 ***
# diet            1262      1  11227.0     28    3.1471   0.08694 .  
# time_fac        2067      2   4900.6     56   11.8078 5.264e-05 ***
# diet:time_fac    193      2   4900.6     56    1.1017   0.33940    

Note both the univariate and multivariate results indicate the interaction is not significant.

Now, my question is whether there's a way to specify that we don't want the interaction term fit in the model, or if there's no way around this given how the Anova() function is set-up.

Best Answer

While I'm no expert in repeated measures ANOVA, I have some familiarity with the Anova() function in car.

Type I or sequential Anova estimates a sequence of models in an effectively arbitrary order, each time permanently removing the previously tested regressor from the subsequent step. Many of its steps are not necessarily interesting simply because the full model isn't being considered in the tests. While Type III Anova seems overall like a snake pit, that you don't touch unless you absolutely know what you're doing (e.g. specify correct contrasts, correctly interpret coefficients, and assorted philosophical conundrums).

As for Type II Anova, in my understanding and as a general principle it estimates a sequence of models with carefully chosen tests, each time removing a single regressor from the model while respecting the principle of marginality. The "principle of marginality" requires that when comparing a model that includes a variable with a model that doesn't include it, all higher-order terms that incorporate said variable (e.g. interactions) should be removed from both models. The full model is used in each step if it doesn't conflict with the principle of marginality. For a more detailed account of how Anova(..., type=2) works and its theoretical underpinnings see Fox and Weisberg (2011), Fox (2016) or even Venables and Ripley (2002) (the relevant sections that tackle the principle of marginality are relatively short reads).

So without any other info (and unless ?Anova has indications to the contrary for these specific models), by looking at the table above I would assume that:

  • the F-test associated with diet:time_fac interaction term was estimated by comparing the full model against the model that doesn't include the interaction term (as usual).
  • the F-test associated with the time_fac main-effect regressor was estimated by comparing two models that both had the interaction term removed: the model that includes both diet and time_fac vs the model that includes only diet.
  • the F-test associated with the diet main-effect regressor is similar to the above: the model that includes both diet and time_fac vs the model that includes only time_fac.

So to answer your question, the interaction term is automatically removed as required by the principle of marginality, so the main effects are tested without the confounding effect of the interaction term. If the interaction term is significant, you disregard the main effects; otherwise, you consider the main effects alone.