Would paired t-test and repeated measures ANOVA with two level of repeated measures on the same data give the same results? I ran a few tests and realized that their p values are the same. Does mean that the two methods are statistical equivalent? But I also find it strange that their assumptions for normality are slightly different, with paired t-test requiring difference in continuous variable to normal while repeated anova requiring the continuous variable to be normal for each level of within subject factor. Please advise.
Solved – Difference between paired t-test and repeated measures ANOVA with two level of repeated measures
anovapaired-datarepeated measurest-test
Related Solutions
... two-way ANOVA tests the effect of A by comparing SS of A with the residual SS, while RM-ANOVA tests the effect of A by comparing SS of A with the A⋅subject interaction SS.
1) Does this difference automatically follow from the repeated-measures structure of the data, or is it some convention?
It follows from the repeated-measures structure of the data. The basic principle of analysis of variance is that we compare the variation between levels of a treatment to the variation between the units that received that treatment. What makes the repeated measure case somewhat tricky is estimating this second variation.
In this simplest case, the thing we're interested in are the differences between the levels of A. So how many units have we measured that difference on? It's the number of subjects, not the number of observations. That is, each subject gives us an additional independent piece of information about the difference, not each observation. Adding more repeated measures increases the accuracy of our information about each subject, but doesn't give us more subjects.
What the RM-Anova does when using the A--subject interaction as the error term is to correctly use the variation in differences between levels of A between subjects as the variation to test the A level effect. Using the observational error instead uses the variation in the repeated measures on each individual, which is not correct.
Consider a case where you take more and more data on just a couple individuals. If using the observation level error, you would eventually reach statistical significance, even though you only have a couple individuals. You need more individuals, not more data on them, to really increase the power.
2) Does this difference between two-way ANOVA and RM-ANOVA correspond to testing two different nulls? If so, what exactly are they and why would we use different nulls in these two cases?
Nope, same null hypothesis. What's different is how we estimate the test statistic and its null distribution.
3) Two-way ANOVA's test can be understood as an F-test between two nested models: the full model, and the model without A. Can RM-ANOVA be understood in a similar way?
Yes, but not perhaps in the way you're hoping for. As you see in the output from aov
, one way of thinking about these kinds of models is that they're really several models in one, with one model for each level.
One can fit the models for higher levels individually by averaging the data over the lower levels. That is, an RM-Anova test for A is equivalent to a standard Anova on the averaged data. Then one can compare models in the usual way.
> library(plyr)
> d2 <- ddply(d, ~Xw1 + id, summarize, Y=mean(Y))
> a1 <- aov(Y ~ id, d2)
> a2 <- aov(Y ~ Xw1+id, d2)
> anova(a1, a2)
Analysis of Variance Table
Model 1: Y ~ id
Model 2: Y ~ Xw1 + id
Res.Df RSS Df Sum of Sq F Pr(>F)
1 40 55475
2 38 23717 2 31758 25.442 9.734e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Alternatively, one can fit the full aov
with all the data but without the term of interest, and then compare the fit with the full aov
with the term of interest, but then to compare models you need to pick out the level of the model you've changed (here the id:Xw1
level) and then you can compare those two models.
> summary(aov(Y ~ 1 + Error(id/Xw1), d))
Error: id
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 19 31359 1650
Error: id:Xw1
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 40 166426 4161
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 120 340490 2837
> (F <- ((166426 - 71151)/2) / (71151/38))
[1] 25.44202
> pf(F, 2, 38, lower=FALSE)
[1] 9.732778e-08
The difference comes from the fact that selecting observations with error == 0
gives you an unbalanced design. ezANOVA
will take cell means for you and pretend that you have a balanced design, but aov
will not.
Let's balance the design ourselves and see that the outputs match:
> data <- aggregate(rt ~ subnum + cue + flank, ANT[ANT$error == 0, ], mean)
> ezANOVA(data, dv=rt, wid=subnum, within=.(cue, flank))
$ANOVA
Effect DFn DFd F p p<.05 ges
2 cue 3 57 477.564650 2.435084e-40 * 0.86387868
3 flank 2 38 958.640865 3.040261e-33 * 0.90297213
4 cue:flank 6 114 4.047785 1.026734e-03 * 0.08633287
$`Mauchly's Test for Sphericity`
Effect W p p<.05
2 cue 0.8670854 0.77271988
3 flank 0.9088146 0.42293876
4 cue:flank 0.1506008 0.04917243 *
$`Sphericity Corrections`
Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05
2 cue 0.9165014 3.647676e-37 * 1.086943 2.435084e-40 *
3 flank 0.9164345 1.182224e-30 * 1.009411 3.040261e-33 *
4 cue:flank 0.6261487 6.059761e-03 * 0.799682 2.641207e-03 *
> summary(aov(rt ~ cue * flank + Error(subnum / (cue * flank)), data))
Error: subnum
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 19 4247 223.5
Error: subnum:cue
Df Sum Sq Mean Sq F value Pr(>F)
cue 3 225486 75162 477.6 <2e-16 ***
Residuals 57 8971 157
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: subnum:flank
Df Sum Sq Mean Sq F value Pr(>F)
flank 2 330651 165326 958.6 <2e-16 ***
Residuals 38 6553 172
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: subnum:cue:flank
Df Sum Sq Mean Sq F value Pr(>F)
cue:flank 6 3357 559.5 4.048 0.00103 **
Residuals 114 15759 138.2
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Best Answer
yes, they are equivalent. these assumptions question has never been directly addressed, though. it is sometimes indicated, that the assumptions you cite for anova, when met, do cover the normality assumption for paired t-test. however, I still wonder, what when the variables are not normal within each subgroup, but their differences (calculated like for t-test) are normal? This should be enough, so the incongruence between these assumptions (as stated in every major statistics handbook) and in your question, are bothering to me too. ;)