Solved – Removing interaction term from repeated measures two-way ANOVA in R: Anova() function in car package

anovainteractionrrepeated measures

My question pertains to excluding the interaction term (once it's deemed insignificant) in a two-way repeated measures ANOVA using the Anova() function in the car package. This question is motivated by:

Trying to better understand how the Anova() function works
Curiosity
A desire to be consistent with how I have taught other types of ANOVAs (I tell my students to remove an insignificant interaction term and refit the model to assess main effects)

Note: I understand the Anova() function has a type= option where one may request either the type II or III SS, and thus we could simply run the model with type=2 and assess the main effect p-values, even if the interaction isn't significant. However, for the reasons listed above, I'm still interested to know if there's any way to actually remove the interaction term and fit a main effects-only model.

Data description: The following example is from the UCLA website and is a repeated measures two-way ANOVA with one within-subject and one between-subject factor The data called exer consists of people who were randomly assigned to two different diets: low-fat and not low-fat and three different types of exercise: at rest, walking leisurely and running. Their pulse rate was measured at three different time points during their assigned exercise: at 1 minute, 15 minutes and 30 minutes.

Here, I'm considering only time and diet as predictors (ignoring exercise for simplicity). Note that time is a within-subjects factor and diet and is a between-.

Data to recreate example:

exer <- read.csv("http://www.ats.ucla.edu/stat/data/exer.csv")

# Convert variables to factor
   exer <- within(exer, {diet <- factor(diet)
                         exertype <- factor(exertype)
                         time <- factor(time)
                         id <- factor(id)
                         }
                  )

# Convert data to wide format for sake of Anova() function
  exer_wide <- reshape(exer, 
                       v.names="pulse", # Outcome variable
                       timevar="time", # Repeated measures
                       idvar=c("id", "diet"), # ID variable and non-time-varying predictors
                       direction="wide")

Snapshot of the data at this point:

exer_wide
#    id diet exertype pulse.1 pulse.2 pulse.3
# 1   1    1        1      85      85      88
# 4   2    1        1      90      92      93
# 7   3    1        1      97      97      94
# 10  4    1        1      80      82      83
# 13  5    1        1      91      92      91
# 16  6    2        1      83      83      84
# 19  7    2        1      87      88      90
# 22  8    2        1      92      94      95
# 25  9    2        1      97      99      96
# 28 10    2        1     100      97     100

Fitting the repeated measures two-way ANOVA:

Step 1: Create linear model object (note between-subjects factor on the right-hand side):

exer_lm <- lm(cbind(pulse.1, pulse.2, pulse.3) ~ diet, data=exer_wide)

Step 2: Create time factor:

time_fac <- factor(c("1","2","3"), ordered=F)

Step 3: Run ANOVA (using type II SS):

library(car)
exer_aov <- Anova(exer_lm, idata=data.frame(time_fac), idesign=~time_fac, type=2)
summary(exer_aov)

# Univariate Type II Repeated-Measures ANOVA Assuming Sphericity

#                   SS num Df Error SS den Df         F    Pr(>F)    
# (Intercept)   894608      1  11227.0     28 2231.1372 < 2.2e-16 ***
# diet            1262      1  11227.0     28    3.1471   0.08694 .  
# time_fac        2067      2   4900.6     56   11.8078 5.264e-05 ***
# diet:time_fac    193      2   4900.6     56    1.1017   0.33940

Note both the univariate and multivariate results indicate the interaction is not significant.

Now, my question is whether there's a way to specify that we don't want the interaction term fit in the model, or if there's no way around this given how the Anova() function is set-up.

Best Answer

While I'm no expert in repeated measures ANOVA, I have some familiarity with the Anova() function in car.

Type I or sequential Anova estimates a sequence of models in an effectively arbitrary order, each time permanently removing the previously tested regressor from the subsequent step. Many of its steps are not necessarily interesting simply because the full model isn't being considered in the tests. While Type III Anova seems overall like a snake pit, that you don't touch unless you absolutely know what you're doing (e.g. specify correct contrasts, correctly interpret coefficients, and assorted philosophical conundrums).

As for Type II Anova, in my understanding and as a general principle it estimates a sequence of models with carefully chosen tests, each time removing a single regressor from the model while respecting the principle of marginality. The "principle of marginality" requires that when comparing a model that includes a variable with a model that doesn't include it, all higher-order terms that incorporate said variable (e.g. interactions) should be removed from both models. The full model is used in each step if it doesn't conflict with the principle of marginality. For a more detailed account of how Anova(..., type=2) works and its theoretical underpinnings see Fox and Weisberg (2011), Fox (2016) or even Venables and Ripley (2002) (the relevant sections that tackle the principle of marginality are relatively short reads).

So without any other info (and unless ?Anova has indications to the contrary for these specific models), by looking at the table above I would assume that:

the F-test associated with diet:time_fac interaction term was estimated by comparing the full model against the model that doesn't include the interaction term (as usual).
the F-test associated with the time_fac main-effect regressor was estimated by comparing two models that both had the interaction term removed: the model that includes both diet and time_fac vs the model that includes only diet.
the F-test associated with the diet main-effect regressor is similar to the above: the model that includes both diet and time_fac vs the model that includes only time_fac.

So to answer your question, the interaction term is automatically removed as required by the principle of marginality, so the main effects are tested without the confounding effect of the interaction term. If the interaction term is significant, you disregard the main effects; otherwise, you consider the main effects alone.

Related Solutions

Solved – How to specify specific contrasts for repeated measures ANOVA using car

This method is generally considered "old-fashioned" so while it may be possible, the syntax is difficult and I suspect fewer people know how to manipulate the anova commands to get what you want. The more common method is using glht with a likelihood-based model from nlme or lme4. (I'm certainly welcome to be proved wrong by other answers though.)

That said, if I needed to do this, I wouldn't bother with the anova commands; I'd just fit the equivalent model using lm, pick out the right error term for this contrast, and compute the F test myself (or equivalently, t test since there's only 1 df). This requires everything to be balanced and have sphericity, but if you don't have that, you should probably be using a likelihood-based model anyway. You might be able to somewhat correct for non-sphericity using the Greenhouse-Geiser or Huynh-Feldt corrections which (I believe) use the same F statistic but modify the df of the error term.

If you really want to use car, you might find the heplot vignettes helpful; they describe how the matrices in the car package are defined.

Using caracal's method (for the contrasts 1&2 - 3 and 1&2 - 4&5), I get

      psiHat      tStat          F         pVal
1 -3.0208333 -7.2204644 52.1351067 2.202677e-09
2 -0.2083333 -0.6098777  0.3719508 5.445988e-01

This is how I'd get those same p-values:

Reshape the data into long format and run lm to get all the SS terms.

library(reshape2)
d <- OBrienKaiser
d$id <- factor(1:nrow(d))
dd <- melt(d, id.vars=c(18,1:2), measure.vars=3:17)
dd$hour <- factor(as.numeric(gsub("[a-z.]*","",dd$variable)))
dd$phase <- factor(gsub("[0-9.]*","", dd$variable), 
                   levels=c("pre","post","fup"))
m <- lm(value ~ treatment*hour*phase + treatment*hour*phase*id, data=dd)
anova(m)

Make an alternate contrast matrix for the hour term.

foo <- matrix(0, nrow=nrow(dd), ncol=4)
foo[dd$hour %in% c(1,2) ,1] <- 0.5
foo[dd$hour %in% c(3) ,1] <- -1
foo[dd$hour %in% c(1,2) ,2] <- 0.5
foo[dd$hour %in% c(4,5) ,2] <- -0.5
foo[dd$hour %in% 1 ,3] <- 1
foo[dd$hour %in% 2 ,3] <- 0
foo[dd$hour %in% 4 ,4] <- 1
foo[dd$hour %in% 5 ,4] <- 0

Check that my contrasts give the same SS as the default contrasts (and the same as from the full model).

anova(lm(value ~ hour, data=dd))
anova(lm(value ~ foo, data=dd))

Get the SS and df for just the two contrasts I want.

anova(lm(value ~ foo[,1], data=dd))
anova(lm(value ~ foo[,2], data=dd))

Get the p-values.

> F <- 73.003/(72.81/52)
> pf(F, 1, 52, lower=FALSE)
[1] 2.201148e-09
> F <- .5208/(72.81/52)
> pf(F, 1, 52, lower=FALSE)
[1] 0.5445999

Optionally adjust for sphericity.

pf(F, 1*.48867, 52*.48867, lower=FALSE)
pf(F, 1*.57413, 52*.57413, lower=FALSE)

Solved – Specifying the Error() term in repeated measures ANOVA in R

It would be

 radpos.aov <- aov(WD ~ Species*Radialposition + Error(Individual/(Radialposition)), data=Radpos)
 summary(radpos.aov, type=3)

That accounts for the within subject error of Radialposition. If you have other within-subject factors, throw them in (in an interaction) with Radialposition in the Error denominator, like + Error(Individual/(Radialpostion*wiFactorA)). That's my understanding of it. That matches up with SPSS's repeated measures GLM if you have no missing data.

Best Answer

Related Solutions

Solved – How to specify specific contrasts for repeated measures ANOVA using car

Solved – Specifying the Error() term in repeated measures ANOVA in R

Related Question