An R cookbook http://www.cookbook-r.com/Statistical_analysis/ANOVA/ has an example of using aov() for mixed design ANOVAs.
I'll copy it here:
data <- read.table(header=T, con <- textConnection('
subject sex age before after
1 F old 9.5 7.1
2 M old 10.3 11.0
3 M old 7.5 5.8
4 F old 12.4 8.8
5 M old 10.2 8.6
6 M old 11.0 8.0
7 M young 9.1 3.0
8 F young 7.9 5.2
9 F old 6.6 3.4
10 M young 7.7 4.0
11 M young 9.4 5.3
12 M old 11.6 11.3
13 M young 9.9 4.6
14 F young 8.6 6.4
15 F young 14.3 13.5
16 F old 9.2 4.7
17 M young 9.8 5.1
18 F old 9.9 7.3
19 F young 13.0 9.5
20 M young 10.2 5.4
21 M young 9.0 3.7
22 F young 7.9 6.2
23 M old 10.1 10.0
24 M young 9.0 1.7
25 M young 8.6 2.9
26 M young 9.4 3.2
27 M young 9.7 4.7
28 M young 9.3 4.9
29 F young 10.7 9.8
30 M old 9.3 9.4
'))
close(con)
Then reshape it:
library(reshape2)
# Make sure subject column is a factor
data$subject <- factor(data$subject)
# Convert it to long format
data.long <- melt(data, id = c("subject","sex","age"), # keep these columns the same
measure = c("before","after"), # Put these two columns into a new column
variable.name="time") # Name of the new column
# subject sex age time value
# 1 F old before 9.5
# 2 M old before 10.3
#...
Now analyze using a mixed anova:
aov.after.age.time <- aov(value ~ age*time + Error(subject/time), data=data.long)
summary(aov.after.age.time)
But when there are more than two predictor variables, the R examples show that the between subject factors are added again after the error term:
#e.g., from R cookbook
#aov.bww <- aov(y ~ b1*b2*w1 + Error(subject/(w1)) + b1*b2, data=data.long)
# which would translate in our case as:
aov.bww <- aov(value ~ sex*age*time + Error(subject/time) + sex*age, data=data.long)
summary(aov.bww)
But why is b1*b2, or in our case sex*age, specified twice? It doesn't seem to make a difference when we remove them after the Error() term:
aov.bww2 <- aov(value ~ sex*age*time + Error(subject/time), data=data.long)
summary(aov.bww2)
Can anyone explain why the examples have those extra terms? The R manual just has this example, where the between factors are not specified twice:
# fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)
Edit:
I have checked the references from the R Cookbook and found other web sites also specify the terms twice in their mixed design examples. See here:
http://www.personality-project.org/R/r.anova.html
where they have the example:
aov.ex5 = aov.ex5 = aov(Recall ~ (Task*Valence*Gender*Dosage) +
Error(Subject/(Task*Valence)) + (Gender*Dosage), data.example5 )
and see here
http://www.statmethods.net/stats/anova.html
with their example:
# Two Within Factors W1 W2, Two Between Factors B1 B2
fit <- aov(y ~ (W1*W2*B1*B2) + Error(Subject/(W1*W2)) + (B1*B2),
data=mydataframe)
Which is presumably where the cookbook got their info from.
Best Answer
No, it is not necessary to specify those terms twice. I suspect it was either a copy/paste typo, or that the author wanted to denote separately the terms that use the subject term for the denominator in the F test and the terms that use the subject/time term. As you note, when the code is run, however, the terms are absolutely unnecessary.
In this case, also notice that the
/time
part of the Error call is unnecessary; thesubject:time
interaction is the lowest level, which is always included in the model. So usingError(subject)
andError(subject/time)
give the same result; the only difference is that in the output, that level of results is called "Within" for the first and is called "subject:time" for the second.