R ANOVA – Specifying Between Subject Factors in aov() Mixed Design

anovarrepeated measures

An R cookbook http://www.cookbook-r.com/Statistical_analysis/ANOVA/ has an example of using aov() for mixed design ANOVAs.

I'll copy it here:

data <- read.table(header=T, con <- textConnection('
 subject sex   age before after
       1   F   old    9.5   7.1
       2   M   old   10.3  11.0
       3   M   old    7.5   5.8
       4   F   old   12.4   8.8
       5   M   old   10.2   8.6
       6   M   old   11.0   8.0
       7   M young    9.1   3.0
       8   F young    7.9   5.2
       9   F   old    6.6   3.4
      10   M young    7.7   4.0
      11   M young    9.4   5.3
      12   M   old   11.6  11.3
      13   M young    9.9   4.6
      14   F young    8.6   6.4
      15   F young   14.3  13.5
      16   F   old    9.2   4.7
      17   M young    9.8   5.1
      18   F   old    9.9   7.3
      19   F young   13.0   9.5
      20   M young   10.2   5.4
      21   M young    9.0   3.7
      22   F young    7.9   6.2
      23   M   old   10.1  10.0
      24   M young    9.0   1.7
      25   M young    8.6   2.9
      26   M young    9.4   3.2
      27   M young    9.7   4.7
      28   M young    9.3   4.9
      29   F young   10.7   9.8
      30   M   old    9.3   9.4
'))
close(con)

Then reshape it:

library(reshape2)

# Make sure subject column is a factor
data$subject <- factor(data$subject) 

# Convert it to long format
data.long <- melt(data, id = c("subject","sex","age"), # keep these columns the same
              measure = c("before","after"),       # Put these two columns into a new column
              variable.name="time")                # Name of the new column

# subject sex   age   time value
#       1   F   old before   9.5
#       2   M   old before  10.3
#...

Now analyze using a mixed anova:

aov.after.age.time <- aov(value ~ age*time + Error(subject/time), data=data.long)
summary(aov.after.age.time)

But when there are more than two predictor variables, the R examples show that the between subject factors are added again after the error term:

#e.g., from R cookbook
#aov.bww <- aov(y ~ b1*b2*w1 + Error(subject/(w1)) + b1*b2, data=data.long)

# which would translate in our case as:
aov.bww <- aov(value ~ sex*age*time + Error(subject/time) + sex*age, data=data.long)
summary(aov.bww)

But why is b1*b2, or in our case sex*age, specified twice? It doesn't seem to make a difference when we remove them after the Error() term:

aov.bww2 <- aov(value ~ sex*age*time + Error(subject/time), data=data.long)
summary(aov.bww2)

Can anyone explain why the examples have those extra terms? The R manual just has this example, where the between factors are not specified twice:

# fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)

Edit:

I have checked the references from the R Cookbook and found other web sites also specify the terms twice in their mixed design examples. See here:
http://www.personality-project.org/R/r.anova.html
where they have the example:

aov.ex5 = aov.ex5 = aov(Recall ~ (Task*Valence*Gender*Dosage) +
Error(Subject/(Task*Valence)) + (Gender*Dosage), data.example5 )

and see here
http://www.statmethods.net/stats/anova.html
with their example:

# Two Within Factors W1 W2, Two Between Factors B1 B2
fit <- aov(y ~ (W1*W2*B1*B2) + Error(Subject/(W1*W2)) + (B1*B2),
data=mydataframe)

Which is presumably where the cookbook got their info from.

Best Answer

No, it is not necessary to specify those terms twice. I suspect it was either a copy/paste typo, or that the author wanted to denote separately the terms that use the subject term for the denominator in the F test and the terms that use the subject/time term. As you note, when the code is run, however, the terms are absolutely unnecessary.

In this case, also notice that the /time part of the Error call is unnecessary; the subject:time interaction is the lowest level, which is always included in the model. So using Error(subject) and Error(subject/time) give the same result; the only difference is that in the output, that level of results is called "Within" for the first and is called "subject:time" for the second.

Related Question