ANOVA – Repeated Measures ANOVA vs. Factorial ANOVA: Understanding Error Strata and Error() Term in AOV

anovarrepeated measures

Consider repeated measures ANOVA (RM-ANOVA) with one within-subject factor A and several measurements per subject for each level of A.

It is closely related to two-way ANOVA with two factors: A and subject. They use identical decomposition of the sum of squares into four parts: A, subject, A⋅subject, and residual. However, two-way ANOVA tests the effect of A by comparing SS of A with the residual SS, while RM-ANOVA tests the effect of A by comparing SS of A with the A$\cdot$subject interaction SS.

Why the difference?

  1. Does this difference automatically follow from the repeated-measures structure of the data, or is it some convention?
  2. Does this difference between two-way ANOVA and RM-ANOVA correspond to testing two different nulls? If so, what exactly are they and why would we use different nulls in these two cases?
  3. Two-way ANOVA's test can be understood as an F-test between two nested models: the full model, and the model without A. Can RM-ANOVA be understood in a similar way?

(If there is only one measurement per subject for each level of A, then the distinction sort of disappears because A$\cdot$subject and residual variation cannot be disentangled: Is one-way repeated measures ANOVA equivalent to a two-way ANOVA?)


Demonstration

I will use toy data d2 generated in http://dwoll.de/rexrepos/posts/anovaMixed.html. The same webpage shows correct syntax for RM-ANOVA.

# Discarding between-subject factors and leaving only one within-subject factor
d = d2[d2$Xb1=='CG' & d2$Xb2 == 'f', c(1,4,6)]

(See reproducible version here on pastebin.) Data look like that:

     id Xw1     Y
1    s1   A  28.6
2    s1   A  96.6
3    s1   A  64.8
4    s1   B 107.5
5    s1   B  77.3
6    s1   B 120.9
7    s1   C 141.2
8    s1   C 124.1
9    s1   C  88.0
10   s2   A  86.7
...

Here is two-way ANOVA: summary(aov(Y ~ Xw1*id, d))

             Df Sum Sq Mean Sq F value   Pr(>F)    
Xw1           2  95274   47637  16.789 3.73e-07 ***
id           19  31359    1650   0.582    0.913    
Xw1:id       38  71151    1872   0.660    0.929    
Residuals   120 340490    2837                 

Here is RM-ANOVA: summary(aov(Y ~ Xw1 + Error(id/Xw1), d))

Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 19  31359    1650               

Error: id:Xw1
          Df Sum Sq Mean Sq F value   Pr(>F)    
Xw1        2  95274   47637   25.44 9.73e-08 ***
Residuals 38  71151    1872                     

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
Residuals 120 340490    2837            

Note the identical SS decomposition, but two-way ANOVA tests Xw1 against the residual, while RM-ANOVA tests Xw1 against the Xw1:id interaction.

Why?

This question is related to How to write the error term in repeated measures ANOVA in R: Error(subject) vs Error(subject/time). If we try using Error(id) instead of Error(id/Xw1) in the example above, then Xw1 will get tested against Xw1:id interaction lumped together with the residual variation.

(The same issue arises in factorial RM-ANOVA with multiple within-subject factors, where each factor or interaction is tested against its own "error term" aka "error stratum". These error strata are always given by the corresponding interaction with the block/plot/subject variable id.)

Best Answer

... two-way ANOVA tests the effect of A by comparing SS of A with the residual SS, while RM-ANOVA tests the effect of A by comparing SS of A with the A⋅subject interaction SS.

1) Does this difference automatically follow from the repeated-measures structure of the data, or is it some convention?

It follows from the repeated-measures structure of the data. The basic principle of analysis of variance is that we compare the variation between levels of a treatment to the variation between the units that received that treatment. What makes the repeated measure case somewhat tricky is estimating this second variation.

In this simplest case, the thing we're interested in are the differences between the levels of A. So how many units have we measured that difference on? It's the number of subjects, not the number of observations. That is, each subject gives us an additional independent piece of information about the difference, not each observation. Adding more repeated measures increases the accuracy of our information about each subject, but doesn't give us more subjects.

What the RM-Anova does when using the A--subject interaction as the error term is to correctly use the variation in differences between levels of A between subjects as the variation to test the A level effect. Using the observational error instead uses the variation in the repeated measures on each individual, which is not correct.

Consider a case where you take more and more data on just a couple individuals. If using the observation level error, you would eventually reach statistical significance, even though you only have a couple individuals. You need more individuals, not more data on them, to really increase the power.

2) Does this difference between two-way ANOVA and RM-ANOVA correspond to testing two different nulls? If so, what exactly are they and why would we use different nulls in these two cases?

Nope, same null hypothesis. What's different is how we estimate the test statistic and its null distribution.

3) Two-way ANOVA's test can be understood as an F-test between two nested models: the full model, and the model without A. Can RM-ANOVA be understood in a similar way?

Yes, but not perhaps in the way you're hoping for. As you see in the output from aov, one way of thinking about these kinds of models is that they're really several models in one, with one model for each level.

One can fit the models for higher levels individually by averaging the data over the lower levels. That is, an RM-Anova test for A is equivalent to a standard Anova on the averaged data. Then one can compare models in the usual way.

> library(plyr)
> d2 <- ddply(d, ~Xw1 + id, summarize, Y=mean(Y))
> a1 <- aov(Y ~ id, d2)
> a2 <- aov(Y ~ Xw1+id, d2)
> anova(a1, a2)
Analysis of Variance Table

Model 1: Y ~ id
Model 2: Y ~ Xw1 + id
  Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
1     40 55475                                  
2     38 23717  2     31758 25.442 9.734e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Alternatively, one can fit the full aov with all the data but without the term of interest, and then compare the fit with the full aov with the term of interest, but then to compare models you need to pick out the level of the model you've changed (here the id:Xw1 level) and then you can compare those two models.

> summary(aov(Y ~ 1 + Error(id/Xw1), d))

Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 19  31359    1650               

Error: id:Xw1
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 40 166426    4161               

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
Residuals 120 340490    2837               
> (F <- ((166426 - 71151)/2) / (71151/38))
[1] 25.44202
> pf(F, 2, 38, lower=FALSE)
[1] 9.732778e-08