Solved – ezAnova vs. lme for factorial repeated-measures design: results differ, why

anovalme4-nlmerrepeated measures

I have used lme and ezAnova to analyse data from a 2$\times$3 repeated-measures experiment. Theoretically those are two different ways to perform the same analysis. However, the resulting $F$-statisics and DF differ and I am lost in why.

Here is the exact data and output:
I have a data set with 2 independent variables (marker_lang and congruency) and the dependent variable RT: Both IV are repeated and completely crossed (thus 6 conditions overall). The data are not collapsed to cell means, meaning that per condition and subject I have several data points.

Here is what I did with ezAnova:

ezANOVA(subset(data.mark.afc, !is.na(afc.RT)), dv=afc.RT, wid=subjectID, 
within=.(marker_lang,congruency), within_full=.(marker_lang,congruency), detailed=1, type=3)

And the output:

$Anova
              Effect DFn DFd          SSn        SSd            F            p   p<.05          ges
1            (Intercept)   1  24 84879098.819 1892110.06 1076.6278430 1.881814e-21     * 0.9762134497
2            marker_lang   1  24    36392.804   80595.30   10.8371986 3.071336e-03     * 0.0172922873
3             congruency   2  48    25426.393   47319.45   12.8960382 3.292730e-05     * 0.0121448066
4 marker_lang:congruency   2  48     1160.152   48150.91    0.5782581 5.647333e-01       0.0005606399

Here is what I did with lme:

basemodel <- lme(data=subset(cdata, !is.na(afc.RT)), afc.RT~1,
random=~1|subjectID/congruency/marker_lang, method="ML")

langmodel <- update(basemodel, .~. + marker_lang)

angcongmodel <- update(langmodel, .~. + congruency)

fulmodel <- update(langcongmodel, .~. +marker_lang:congruency)

And the anova-tables for the lme analysis:

anova(fulmodel)
                   numDF denDF   F-value p-value
(Intercept)                1  4715 1112.3468  <.0001
marker_lang                1    72   24.8917  <.0001
congruency                 2    48    8.3902  0.0008
marker_lang:congruency     2    72    0.4475  0.6410

anova(basemodel, langmodel, langcongmodel, fulmodel)
          Model df      AIC      BIC    logLik   Test   L.Ratio p-value
basemodel         1  5 64203.43 64235.88 -32096.72                         
langmodel         2  6 64185.06 64224.00 -32086.53 1 vs 2 20.366313  <.0001
langcongmodel     3  8 64173.27 64225.19 -32078.64 2 vs 3 15.790082  0.0004
fulmodel          4 10 64176.38 64241.28 -32078.19 3 vs 4  0.892535  0.6400

I would expect the $F$, DF, and $p$-values for corresponding effects to be the same, which is not the case. This seems not an issue of different anova-types, as I tried out different types for ezAnova. None yield the same result as anova of fulmodel.

Any help/ideas will be greatly appreciated!

Best Answer

It looks like with ezANOVA you just did a classical Type III SS ANOVA (which uses the Anova command from the car package). With lme you're doing an ANOVA analysis of a linear mixed effects model. They're not supposed to come out the same. ezANOVA can also do the latter (using lmer) but that isn't what you've asked for.

You should be aggregating your data set in order to do your ordinary ANOVA and NOT doing so for lme. It's hard to tell if you've done that. You don't get to use multiple measures per condition per subject in ordinary ANOVA. This is a main reason for your degrees of freedom difference between the first two analyses. They are calculated very differently but you should get very similar results for the first two analyses if the data are aggregated. However, that's not the right way to do the lme and not a goal you should be striving toward.

Your last analysis doesn't even have F values in it (L. Ratio is a chi-square). How were you imagining the Fs could be the same? It's comparing the likelihood of the models and is different yet again.

(note: Also there are two ezANOVA programs. One is written by Chris Rorden and runs as a stand alone data analysis program and the other is from the ez package in R written by Mike Lawrence. For anyone doing a web search this is clearly the latter given the R command listed here.)

Related Solutions

Solved – Why do lme and aov return different results for repeated measures ANOVA in R

They are different because the lme model is forcing the variance component of id to be greater than zero. Looking at the raw anova table for all terms, we see that the mean squared error for id is less than that for the residuals.

> anova(lm1 <- lm(value~ factor+id, data=tau.base))

          Df  Sum Sq Mean Sq F value Pr(>F)
factor     3  0.6484 0.21614  1.3399 0.2694
id        21  3.1609 0.15052  0.9331 0.5526
Residuals 63 10.1628 0.16131

When we compute the variance components, this means that the variance due to id will be negative. My memory of expected mean squares memory is shaky, but the calculation is something like

(0.15052-0.16131)/3 = -0.003597.

This sounds odd but can happen. What it means is that the averages for each id are closer to each other than you would expect to each other given the amount of residual variation in the model.

In contrast, using lme forces this variance to be greater than zero.

> summary(lme1 <- lme(value ~ factor, data = tau.base, random = ~1|id))
...
Random effects:
 Formula: ~1 | id
        (Intercept)  Residual
StdDev: 3.09076e-05 0.3982667

This reports standard deviations, squaring to get the variance yields 9.553e-10 for the id variance and 0.1586164 for the residual variance.

Now, you should know that using aov for repeated measures is only appropriate if you believe that the correlation between all pairs of repeated measures is identical; this is called compound symmetry. (Technically, sphericity is required but this is sufficient for now.) One reason to use lme over aov is that it can handle different kinds of correlation structures.

In this particular data set, the estimate for this correlation is negative; this helps explain how the mean squared error for id was less than the residual squared error. A negative correlation means that if an individual's first measurement was below average, on average, their second would be above average, making the total averages for the individuals less variable than we would expect if there was a zero correlation or a positive correlation.

Using lme with a random effect is equivalent to fitting a compound symmetry model where that correlation is forced to be non-negative; we can fit a model where the correlation is allowed to be negative using gls:

> anova(gls1 <- gls(value ~ factor, correlation=corCompSymm(form=~1|id),
                    data=tau.base))
Denom. DF: 84 
            numDF   F-value p-value
(Intercept)     1 199.55223  <.0001
factor          3   1.33985   0.267

This ANOVA table agrees with the table from the aov fit and from the lm fit.

OK, so what? Well, if you believe that the variance from id and the correlation between observations should be non-negative, the lme fit is actually more appropriate than the fit using aov or lm as its estimate of the residual variance is slightly better. However, if you believe the correlation between observations could be negative, aov or lm or gls is better.

You may also be interested in exploring the correlation structure further; to look at a general correlation structure, you'd do something like

gls2 <- gls(value ~ factor, correlation=corSymm(form=~unclass(factor)|id),
data=tau.base)

Here I only limit the output to the correlation structure. The values 1 to 4 represent the four levels of factor; we see that factor 1 and factor 4 have a fairly strong negative correlation:

> summary(gls2)
...
Correlation Structure: General
 Formula: ~unclass(factor) | id 
 Parameter estimate(s):
 Correlation: 
  1      2      3     
2  0.049              
3 -0.127  0.208       
4 -0.400  0.146 -0.024

One way to choose between these models is with a likelihood ratio test; this shows that the random effects model and the general correlation structure model aren't statistically significantly different; when that happens the simpler model is usually preferred.

> anova(lme1, gls2)
     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
lme1     1  6 108.0794 122.6643 -48.03972                        
gls2     2 11 111.9787 138.7177 -44.98936 1 vs 2 6.100725  0.2965

Solved – repeated measures factorial design

Yes, it's possible, but it's hard to get a time trend factor, it might be easier as a multilevel model. You can do this with SAS proc mixed:

proc mixed data = mydata; 
class  unit A B; 
model outcome = A B time ; 
repeated /subject = unit type = cs rcorr; 
run;

The data should be in long format, so outcome is your outcome variable, and unit identifies the unit - each unit will have three rows in the dataset.

You might want to add A*B to the model line (but you're going to be close to running out of degrees of freedom).

You could also treat time as categorical by adding it to the class line.

Sometimes I like to play with a simpler model, to test that they really are equivalent:

proc mixed data = mydata; 
class  unit time; 
model outcome =  time ; 
repeated /subject = unit type = cs rcorr; 
run;

This model is just a repeated measures anova, with time as the only factor. You should get the same (or very nearly the same) answer doing it both ways.

Best Answer

Related Solutions

Solved – Why do lme and aov return different results for repeated measures ANOVA in R

Solved – repeated measures factorial design

Related Question