The title says it all, and I'm confused. The following runs a repeated measures aov() in R, and runs what I thought was an equivalent lm() call, but they return different error residuals (although the sums of squares are the same).
Clearly the residuals and fitted values from aov() are the ones used in the model, because their sums of squares add up to each of the model/residual sums of squares reported in summary(my.aov). So what are the actual linear models that are applied to a repeated measures design?
set.seed(1)
# make data frame,
# 5 participants, with 2 experimental factors, each with 2 levels
# factor1 is A, B
# factor2 is 1, 2
DF <- data.frame(participant=factor(1:5), A.1=rnorm(5, 50, 20), A.2=rnorm(5, 100, 20), B.1=rnorm(5, 20, 20), B.2=rnorm(5, 50, 20))
# get our experimental conditions
conditions <- names(DF)[ names(DF) != "participant" ]
# reshape it for aov
DFlong <- reshape(DF, direction="long", varying=conditions, v.names="value", idvar="participant", times=conditions, timevar="group")
# make the conditions separate variables called factor1 and factor2
DFlong$factor1 <- factor( rep(c("A", "B"), each=10) )
DFlong$factor2 <- factor( rep(c(1, 2), each=5) )
# call aov
my.aov <- aov(value ~ factor1*factor2 + Error(participant / (factor1*factor2)), DFlong)
# similar for an lm() call
fit <- lm(value ~ factor1*factor2 + participant, DFlong)
# what's aov telling us?
summary(my.aov)
# check SS residuals
sum(residuals(fit)^2) # == 5945.668
# check they add up to the residuals from summary(my.aov)
2406.1 + 1744.1 + 1795.46 # == 5945.66
# all good so far, but how are the residuals in the aov calculated?
my.aov$"participant:factor1"$residuals
#clearly these are the ones used in the ANOVA:
sum(my.aov$"participant:factor1"$residuals ^ 2)
# this corresponds to the factor1 residuals here:
summary(my.aov)
# but they are different to the residuals reported from lm()
residuals(fit)
my.aov$"participant"$residuals
my.aov$"participant:factor1"$residuals
my.aov$"participant:factor1:factor2"$residuals
Best Answer
One way to think about it is to treat the situation as a 3-factorial between subjects ANOVA with IVs
participant
,factor1
,factor2
, and a cell size of 1.anova(lm(value ~ factor1*factor2*participant, DFlong))
calculates all the SS for all effects in this 3-way ANOVA (3 main effects, 3 first-order interactions, 1 second-order interaction). Since there's only 1 person in each cell, the full model has no errors, and the above call toanova()
cannot compute F-tests. But the SS are the same as in the 2-factorial within design.How does
anova()
actually compute the SS for an effect? Through sequential model comparisons (type I): It fits a restricted model without the effect in question, and an unrestricted model which includes that effect. The SS associated with this effect is the difference in error SS between both models.Now let's check the effect SS associated with the interaction
id:IV1
by subtracting the error SS of the unrestricted model from the error SS of the restricted model.Now that you have all the "raw" effect SS, you can build the within-subjects tests simply by choosing the correct error term to test an effect SS against. E.g., test the effect SS for
factor1
against the interaction effect SS ofparticipant:factor1
.For an excellent introduction to the model comparison approach, I recommend Maxwell & Delaney (2004). Designing Experiments and Analyzing Data.