R ANOVA – Specifying Between Subject Factors in aov() Mixed Design

anovarrepeated measures

An R cookbook http://www.cookbook-r.com/Statistical_analysis/ANOVA/ has an example of using aov() for mixed design ANOVAs.

I'll copy it here:

data <- read.table(header=T, con <- textConnection('
 subject sex   age before after
       1   F   old    9.5   7.1
       2   M   old   10.3  11.0
       3   M   old    7.5   5.8
       4   F   old   12.4   8.8
       5   M   old   10.2   8.6
       6   M   old   11.0   8.0
       7   M young    9.1   3.0
       8   F young    7.9   5.2
       9   F   old    6.6   3.4
      10   M young    7.7   4.0
      11   M young    9.4   5.3
      12   M   old   11.6  11.3
      13   M young    9.9   4.6
      14   F young    8.6   6.4
      15   F young   14.3  13.5
      16   F   old    9.2   4.7
      17   M young    9.8   5.1
      18   F   old    9.9   7.3
      19   F young   13.0   9.5
      20   M young   10.2   5.4
      21   M young    9.0   3.7
      22   F young    7.9   6.2
      23   M   old   10.1  10.0
      24   M young    9.0   1.7
      25   M young    8.6   2.9
      26   M young    9.4   3.2
      27   M young    9.7   4.7
      28   M young    9.3   4.9
      29   F young   10.7   9.8
      30   M   old    9.3   9.4
'))
close(con)

Then reshape it:

library(reshape2)

# Make sure subject column is a factor
data$subject <- factor(data$subject) 

# Convert it to long format
data.long <- melt(data, id = c("subject","sex","age"), # keep these columns the same
              measure = c("before","after"),       # Put these two columns into a new column
              variable.name="time")                # Name of the new column

# subject sex   age   time value
#       1   F   old before   9.5
#       2   M   old before  10.3
#...

Now analyze using a mixed anova:

aov.after.age.time <- aov(value ~ age*time + Error(subject/time), data=data.long)
summary(aov.after.age.time)

But when there are more than two predictor variables, the R examples show that the between subject factors are added again after the error term:

#e.g., from R cookbook
#aov.bww <- aov(y ~ b1*b2*w1 + Error(subject/(w1)) + b1*b2, data=data.long)

# which would translate in our case as:
aov.bww <- aov(value ~ sex*age*time + Error(subject/time) + sex*age, data=data.long)
summary(aov.bww)

But why is b1*b2, or in our case sex*age, specified twice? It doesn't seem to make a difference when we remove them after the Error() term:

aov.bww2 <- aov(value ~ sex*age*time + Error(subject/time), data=data.long)
summary(aov.bww2)

Can anyone explain why the examples have those extra terms? The R manual just has this example, where the between factors are not specified twice:

# fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)

Edit:

I have checked the references from the R Cookbook and found other web sites also specify the terms twice in their mixed design examples. See here:
http://www.personality-project.org/R/r.anova.html
where they have the example:

aov.ex5 = aov.ex5 = aov(Recall ~ (Task*Valence*Gender*Dosage) +
Error(Subject/(Task*Valence)) + (Gender*Dosage), data.example5 )

and see here
http://www.statmethods.net/stats/anova.html
with their example:

# Two Within Factors W1 W2, Two Between Factors B1 B2
fit <- aov(y ~ (W1*W2*B1*B2) + Error(Subject/(W1*W2)) + (B1*B2),
data=mydataframe)

Which is presumably where the cookbook got their info from.

Best Answer

No, it is not necessary to specify those terms twice. I suspect it was either a copy/paste typo, or that the author wanted to denote separately the terms that use the subject term for the denominator in the F test and the terms that use the subject/time term. As you note, when the code is run, however, the terms are absolutely unnecessary.

In this case, also notice that the /time part of the Error call is unnecessary; the subject:time interaction is the lowest level, which is always included in the model. So using Error(subject) and Error(subject/time) give the same result; the only difference is that in the output, that level of results is called "Within" for the first and is called "subject:time" for the second.

Related Solutions

Solved – Repeated-measures error in R ezANOVA using more levels than subjects (balanced design)

This issue is described in this post by John Fox - author of the car::Anova() function that is used internally by ezANOVA().

As a workaround, you can use anova() using a multivariate model specification that is described in this article by Peter Dalgaard as well as in this excellent answer by Aaron. Here's a reproducible example with data in wide format:

set.seed(123)  ## make reproducible
N  <- 18       ## number of subjects
P  <- 3        ## number of conditions
Q  <- 29       ## number of sites
voltage <- matrix(round(rnorm(N*P*Q), 2), nrow=N)   ## (N x (PxQ))-matrix with voltages

fit  <- lm(voltage ~ 1)   ## between-subjects design (here: no between factors)
inDf <- expand.grid(channel=gl(P, 1), electrode=gl(Q, 1))  ## within design
library(car)              ## for Anova()
AnRes <- Anova(fit, idata=inDf, idesign=~channel*electrode)
summary(AnRes, multivariate=FALSE, univariate=TRUE)

Due to the singular SSP-matrix, this does not return sphericity-corrected p-values:

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
                      SS num Df Error SS den Df      F Pr(>F)
(Intercept)        0.862      1    14.08     17 1.0413 0.3218
channel            1.815      2    42.62     34 0.7237 0.4923
electrode         21.018     28   439.35    476 0.8132 0.7408
channel:electrode 56.375     56   945.04    952 1.0141 0.4484

Instead, use anova() with the multivariate model (shortened output). Test for channel:

> anova(fit, M=~channel, X=~1, idata=inDf, test="Spherical")
Greenhouse-Geisser epsilon: 0.9569
Huynh-Feldt epsilon:        1.0754

            Df      F num Df den Df  Pr(>F)  G-G Pr  H-F Pr
(Intercept)  1 0.7237      2     34 0.49225 0.48682 0.49225
Residuals   17

Test for electrode:

> anova(fit, M=~channel + electrode, X=~channel, idata=inDf, test="Spherical")
Greenhouse-Geisser epsilon: 0.3729
Huynh-Feldt epsilon:        1.0126

            Df      F num Df den Df  Pr(>F)  G-G Pr  H-F Pr
(Intercept)  1 0.8132     28    476 0.74076 0.62102 0.74076
Residuals   17

Test for channel:electrode interaction:

> anova(fit, M=~channel + electrode + channel:electrode, X=~channel + electrode, idata=inDf, test="Spherical")
Greenhouse-Geisser epsilon: 0.233
Huynh-Feldt epsilon:        1.052

            Df      F num Df den Df  Pr(>F) G-G Pr  H-F Pr
(Intercept)  1 1.0141     56    952 0.44836 0.4386 0.44836
Residuals   17

Solved – why does the same repeated measures anova using ezANOVA() vs. aov() yield different distributions of model residuals

First, there is no aov function in the car package, so I'm guessing you are referring to aov from the stats package.

Second, by comparing the QQ-plots of the residuals obtained via aov and ezANOVA, you are apparently not using the same data - the aov plot shows many more data points than the ezANOVA plot.

Third, I suspect that since you're not using identical data, this is the reason why you get different residuals. I just compared the residuals with a toy example, and both aov and ezANOVA yield identical results:

library(ez)

rating <- c(8, 9, 6, 5, 8, 7, 10, 12, 7, 5, 2, 3, 4, 5, 2, 6, 1, 2, 3, 1, 5, 6, 7, 8, 6, 5, 8, 9, 8, 7, 2, 1)
group <- factor(rep(1:4, each=8, len=32))
id <- factor(rep(1:8, times=4))
df <- data.frame(id, group, rating)

mod.ez <- ezANOVA(df, rating, id, within=group, return_aov=T)
mod.aov <- aov(rating ~ group + Error(id/group), df)

res.ez <- sort(proj(mod.ez$aov)[[3]][, "Residuals"])
res.aov <- sort(proj(mod.aov)[[3]][, "Residuals"])

res.ez - res.aov

plot(res.ez, res.aov)

So except for numerical errors, the residuals from the two methods are identical.

Best Answer

Related Solutions

Solved – Repeated-measures error in R ezANOVA using more levels than subjects (balanced design)

Solved – why does the same repeated measures anova using ezANOVA() vs. aov() yield different distributions of model residuals

Related Question