Solved – Difference between paired t-test and repeated measures ANOVA with two level of repeated measures

anovapaired-datarepeated measurest-test

Would paired t-test and repeated measures ANOVA with two level of repeated measures on the same data give the same results? I ran a few tests and realized that their p values are the same. Does mean that the two methods are statistical equivalent? But I also find it strange that their assumptions for normality are slightly different, with paired t-test requiring difference in continuous variable to normal while repeated anova requiring the continuous variable to be normal for each level of within subject factor. Please advise.

Best Answer

yes, they are equivalent. these assumptions question has never been directly addressed, though. it is sometimes indicated, that the assumptions you cite for anova, when met, do cover the normality assumption for paired t-test. however, I still wonder, what when the variables are not normal within each subgroup, but their differences (calculated like for t-test) are normal? This should be enough, so the incongruence between these assumptions (as stated in every major statistics handbook) and in your question, are bothering to me too. ;)

Related Solutions

ANOVA – Repeated Measures ANOVA vs. Factorial ANOVA: Understanding Error Strata and Error() Term in AOV

... two-way ANOVA tests the effect of A by comparing SS of A with the residual SS, while RM-ANOVA tests the effect of A by comparing SS of A with the A⋅subject interaction SS.

1) Does this difference automatically follow from the repeated-measures structure of the data, or is it some convention?

It follows from the repeated-measures structure of the data. The basic principle of analysis of variance is that we compare the variation between levels of a treatment to the variation between the units that received that treatment. What makes the repeated measure case somewhat tricky is estimating this second variation.

In this simplest case, the thing we're interested in are the differences between the levels of A. So how many units have we measured that difference on? It's the number of subjects, not the number of observations. That is, each subject gives us an additional independent piece of information about the difference, not each observation. Adding more repeated measures increases the accuracy of our information about each subject, but doesn't give us more subjects.

What the RM-Anova does when using the A--subject interaction as the error term is to correctly use the variation in differences between levels of A between subjects as the variation to test the A level effect. Using the observational error instead uses the variation in the repeated measures on each individual, which is not correct.

Consider a case where you take more and more data on just a couple individuals. If using the observation level error, you would eventually reach statistical significance, even though you only have a couple individuals. You need more individuals, not more data on them, to really increase the power.

2) Does this difference between two-way ANOVA and RM-ANOVA correspond to testing two different nulls? If so, what exactly are they and why would we use different nulls in these two cases?

Nope, same null hypothesis. What's different is how we estimate the test statistic and its null distribution.

3) Two-way ANOVA's test can be understood as an F-test between two nested models: the full model, and the model without A. Can RM-ANOVA be understood in a similar way?

Yes, but not perhaps in the way you're hoping for. As you see in the output from aov, one way of thinking about these kinds of models is that they're really several models in one, with one model for each level.

One can fit the models for higher levels individually by averaging the data over the lower levels. That is, an RM-Anova test for A is equivalent to a standard Anova on the averaged data. Then one can compare models in the usual way.

> library(plyr)
> d2 <- ddply(d, ~Xw1 + id, summarize, Y=mean(Y))
> a1 <- aov(Y ~ id, d2)
> a2 <- aov(Y ~ Xw1+id, d2)
> anova(a1, a2)
Analysis of Variance Table

Model 1: Y ~ id
Model 2: Y ~ Xw1 + id
  Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
1     40 55475                                  
2     38 23717  2     31758 25.442 9.734e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Alternatively, one can fit the full aov with all the data but without the term of interest, and then compare the fit with the full aov with the term of interest, but then to compare models you need to pick out the level of the model you've changed (here the id:Xw1 level) and then you can compare those two models.

> summary(aov(Y ~ 1 + Error(id/Xw1), d))

Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 19  31359    1650               

Error: id:Xw1
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 40 166426    4161               

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
Residuals 120 340490    2837               
> (F <- ((166426 - 71151)/2) / (71151/38))
[1] 25.44202
> pf(F, 2, 38, lower=FALSE)
[1] 9.732778e-08

Solved – Difference between aov() and ezANOVA when using a subset of DataFrame in repeated measures ANOVA

The difference comes from the fact that selecting observations with error == 0 gives you an unbalanced design. ezANOVA will take cell means for you and pretend that you have a balanced design, but aov will not.

Let's balance the design ourselves and see that the outputs match:

> data <- aggregate(rt ~ subnum + cue + flank, ANT[ANT$error == 0, ], mean)
> ezANOVA(data, dv=rt, wid=subnum, within=.(cue, flank))
$ANOVA
     Effect DFn DFd          F            p p<.05        ges
2       cue   3  57 477.564650 2.435084e-40     * 0.86387868
3     flank   2  38 958.640865 3.040261e-33     * 0.90297213
4 cue:flank   6 114   4.047785 1.026734e-03     * 0.08633287

$`Mauchly's Test for Sphericity`
     Effect         W          p p<.05
2       cue 0.8670854 0.77271988      
3     flank 0.9088146 0.42293876      
4 cue:flank 0.1506008 0.04917243     *

$`Sphericity Corrections`
     Effect       GGe        p[GG] p[GG]<.05      HFe        p[HF] p[HF]<.05
2       cue 0.9165014 3.647676e-37         * 1.086943 2.435084e-40         *
3     flank 0.9164345 1.182224e-30         * 1.009411 3.040261e-33         *
4 cue:flank 0.6261487 6.059761e-03         * 0.799682 2.641207e-03         *
> summary(aov(rt ~ cue * flank + Error(subnum / (cue * flank)), data))

Error: subnum
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 19   4247   223.5               

Error: subnum:cue
          Df Sum Sq Mean Sq F value Pr(>F)    
cue        3 225486   75162   477.6 <2e-16 ***
Residuals 57   8971     157                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: subnum:flank
          Df Sum Sq Mean Sq F value Pr(>F)    
flank      2 330651  165326   958.6 <2e-16 ***
Residuals 38   6553     172                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: subnum:cue:flank
           Df Sum Sq Mean Sq F value  Pr(>F)   
cue:flank   6   3357   559.5   4.048 0.00103 **
Residuals 114  15759   138.2                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Best Answer

Related Solutions

ANOVA – Repeated Measures ANOVA vs. Factorial ANOVA: Understanding Error Strata and Error() Term in AOV

Solved – Difference between aov() and ezANOVA when using a subset of DataFrame in repeated measures ANOVA

Related Question