My question pertains to excluding the interaction term (once it's deemed insignificant) in a two-way repeated measures ANOVA using the Anova()
function in the car
package. This question is motivated by:
- Trying to better understand how the
Anova()
function works - Curiosity
- A desire to be consistent with how I have taught other types of ANOVAs (I tell my students to remove an insignificant interaction term and refit the model to assess main effects)
Note: I understand the Anova()
function has a type=
option where one may request either the type II or III SS, and thus we could simply run the model with type=2
and assess the main effect p-values, even if the interaction isn't significant. However, for the reasons listed above, I'm still interested to know if there's any way to actually remove the interaction term and fit a main effects-only model.
Data description: The following example is from the UCLA website and is a repeated measures two-way ANOVA with one within-subject and one between-subject factor The data called exer
consists of people who were randomly assigned to two different diets: low-fat and not low-fat and three different types of exercise: at rest, walking leisurely and running. Their pulse rate was measured at three different time points during their assigned exercise: at 1 minute, 15 minutes and 30 minutes.
Here, I'm considering only time and diet as predictors (ignoring exercise for simplicity). Note that time is a within-subjects factor and diet and is a between-.
Data to recreate example:
exer <- read.csv("http://www.ats.ucla.edu/stat/data/exer.csv")
# Convert variables to factor
exer <- within(exer, {diet <- factor(diet)
exertype <- factor(exertype)
time <- factor(time)
id <- factor(id)
}
)
# Convert data to wide format for sake of Anova() function
exer_wide <- reshape(exer,
v.names="pulse", # Outcome variable
timevar="time", # Repeated measures
idvar=c("id", "diet"), # ID variable and non-time-varying predictors
direction="wide")
Snapshot of the data at this point:
exer_wide
# id diet exertype pulse.1 pulse.2 pulse.3
# 1 1 1 1 85 85 88
# 4 2 1 1 90 92 93
# 7 3 1 1 97 97 94
# 10 4 1 1 80 82 83
# 13 5 1 1 91 92 91
# 16 6 2 1 83 83 84
# 19 7 2 1 87 88 90
# 22 8 2 1 92 94 95
# 25 9 2 1 97 99 96
# 28 10 2 1 100 97 100
Fitting the repeated measures two-way ANOVA:
Step 1: Create linear model object (note between-subjects factor on the right-hand side):
exer_lm <- lm(cbind(pulse.1, pulse.2, pulse.3) ~ diet, data=exer_wide)
Step 2: Create time factor:
time_fac <- factor(c("1","2","3"), ordered=F)
Step 3: Run ANOVA (using type II SS):
library(car)
exer_aov <- Anova(exer_lm, idata=data.frame(time_fac), idesign=~time_fac, type=2)
summary(exer_aov)
# Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
# SS num Df Error SS den Df F Pr(>F)
# (Intercept) 894608 1 11227.0 28 2231.1372 < 2.2e-16 ***
# diet 1262 1 11227.0 28 3.1471 0.08694 .
# time_fac 2067 2 4900.6 56 11.8078 5.264e-05 ***
# diet:time_fac 193 2 4900.6 56 1.1017 0.33940
Note both the univariate and multivariate results indicate the interaction is not significant.
Now, my question is whether there's a way to specify that we don't want the interaction term fit in the model, or if there's no way around this given how the Anova()
function is set-up.
Best Answer
While I'm no expert in repeated measures ANOVA, I have some familiarity with the
Anova()
function incar
.Type I
or sequential Anova estimates a sequence of models in an effectively arbitrary order, each time permanently removing the previously tested regressor from the subsequent step. Many of its steps are not necessarily interesting simply because the full model isn't being considered in the tests. WhileType III
Anova seems overall like a snake pit, that you don't touch unless you absolutely know what you're doing (e.g. specify correct contrasts, correctly interpret coefficients, and assorted philosophical conundrums).As for
Type II
Anova, in my understanding and as a general principle it estimates a sequence of models with carefully chosen tests, each time removing a single regressor from the model while respecting the principle of marginality. The "principle of marginality" requires that when comparing a model that includes a variable with a model that doesn't include it, all higher-order terms that incorporate said variable (e.g. interactions) should be removed from both models. The full model is used in each step if it doesn't conflict with the principle of marginality. For a more detailed account of howAnova(..., type=2)
works and its theoretical underpinnings see Fox and Weisberg (2011), Fox (2016) or even Venables and Ripley (2002) (the relevant sections that tackle the principle of marginality are relatively short reads).So without any other info (and unless
?Anova
has indications to the contrary for these specific models), by looking at the table above I would assume that:F-test
associated withdiet:time_fac
interaction term was estimated by comparing the full model against the model that doesn't include the interaction term (as usual).F-test
associated with thetime_fac
main-effect regressor was estimated by comparing two models that both had the interaction term removed: the model that includes bothdiet
andtime_fac
vs the model that includes onlydiet
.F-test
associated with thediet
main-effect regressor is similar to the above: the model that includes bothdiet
andtime_fac
vs the model that includes onlytime_fac
.So to answer your question, the interaction term is automatically removed as required by the principle of marginality, so the main effects are tested without the confounding effect of the interaction term. If the interaction term is significant, you disregard the main effects; otherwise, you consider the main effects alone.