Solved – Different results obtained with lmer() and aov() for three-way repeated-measures experiment

anovalme4-nlmemixed modelrepeated measures

My question concerns results from one of my experiments, where analyzing the data with RM-ANOVA (using either ezANOVA or aov on the aggregated data in R) yields significantly different results from fitting the equivalent model on the raw data in lmer.

In the experiment, participants are presented with 40 different items that they have to rate on attractiveness on a 200-point scale before a manipulation takes place, and afterwards (DV). The items are manipulated on two independent variables, both crossed within-subject:

att_cond: attentional cueing (factor; cued/uncued)

gng_cond: response (go/no-go) (factor; responded/unresponded)

In short, participants rate all the items on the 200-point scale, receive a training in which half of the items receive an attentional cue while the other half is uncued. Similarly, half of the items are responded to during the training (go-items), while the other half are non-responded items (no-go items).

Thus, my data has the following structure:

>head(subset_go_nogo[c(1:3, 4,5,7)])
subjectID FoodItem gng_cond att_cond timepoint rating
1 3 63 go 0 0 136
2 3 23 no-go 1 0 179
3 3 55 no-go 1 0 144
4 3 7 go 0 0 165
5 3 42 no-go 1 0 144
6 3 16 no-go 0 0 180

>str(subset_go_nogo[c(1:3, 4,5,7)])
'data.frame': 3280 obs. of 6 variables:
$ subjectID: Factor w/ 41 levels "10","11","13",..: 19 19 19 19 19 19 19 19 19 19 ...
$ FoodItem : Factor w/ 80 levels "1","10","11",..: 60 16 51 67 37 8 13 19 61 32 ...
$ gng_cond : Factor w/ 2 levels "no-go","go": 2 1 1 2 1 1 2 1 2 1 ...
$ att_cond : Factor w/ 2 levels "0","1": 1 2 2 1 2 1 2 1 2 1 ...
$ timepoint: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ rating : num 136 179 144 165 144 180 200 173 183 175 ...

Therefore, I have 40 ratings per participant (N = 41). I am especially interested in the timepoint*gng_cond*att_cond 3-way interaction of which I have 10 observations per participant in each cell, without any missing data (e.g):

>with(subset_go_nogo, table(gng_cond, att_cond, timepoint, subjectID))

, , timepoint = 0, subjectID = 10
att_cond
gng_cond 0 1
no-go 10 10
go 10 10

, , timepoint = 1, subjectID = 10
att_cond
gng_cond 0 1
no-go 10 10
go 10 10

I believe that my design is balanced and that I have crossed random effects of subjectID and FoodItem. However, to simplify the problem I will demonstrate what I get on the model with only subjectID as a random intercept.

I will provide a lot of output in the following, but I will give a short summary of the main problem first:
In sum, while in the first models (random intercept ´lmer´ and ´subjectID`-only error term aov) the models yield identical results and p-values. However, when adding the error-terms of the within-subject predictors to the model, the anova yields a significant 3-way interaction with F = 6.446, p = 0.0151 while the mixed-effects model indicates a non-significant interaction of F = 3.629, p = 0.0568.

If I start off by fitting the most basic version of the lmer model including the 3-way interaction I get:

> lmer.1 <- lmer(rating ~ timepoint*att_cond*gng_cond + (1 | subjectID), data = d),

yielding the following model summary:

summary(lmer.1)

Random effects:
  Groups    Name        Variance Std.Dev.
subjectID (Intercept) 279.4    16.71   
Residual              922.1    30.37   
Number of obs: 3280, groups:  subjectID, 41

Fixed effects:
  Estimate Std. Error            df t value Pr(>|t|)    
(Intercept)                     140.88293    2.66360   40.00000  52.892   <2e-16 ***
timepoint1                        6.92561    0.53021 3232.00000  13.062   <2e-16 ***
att_cond1                         0.13354    0.53021 3232.00000   0.252   0.8012    
gng_cond1                        -1.35854    0.53021 3232.00000  -2.562   0.0104 *  
timepoint1:att_cond1             -0.02744    0.53021 3232.00000  -0.052   0.9587    
timepoint1:gng_cond1              1.36220    0.53021 3232.00000   2.569   0.0102 *  
att_cond1:gng_cond1               1.00549    0.53021 3232.00000   1.896   0.0580 .  
timepoint1:att_cond1:gng_cond1   -0.98963    0.53021 3232.00000  -1.866   0.0621 .  
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

and mean squares:

> anova(lmer.1)
Analysis of Variance Table
Df Sum Sq Mean Sq F value
timepoint 1 157322 157322 170.6164
att_cond 1 58 58 0.0634
gng_cond 1 6054 6054 6.5652
timepoint:att_cond 1 2 2 0.0027
timepoint:gng_cond 1 6086 6086 6.6006
att_cond:gng_cond 1 3316 3316 3.5963
timepoint:att_cond:gng_cond 1 3212 3212 3.4838

The equivalent RM-ANOVA would be:

aov1 <- aov(rating ~ timepoint*att_cond*gng_cond + Error(subjectID),data=d_aggregated)
summary(aov1)

Error: subjectID
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 40 930832   23271               

Error: Within
Df  Sum Sq Mean Sq F value Pr(>F)    
timepoint                      1  157322  157322 170.616 <2e-16 ***
att_cond                       1      58      58   0.063 0.8012    
gng_cond                       1    6054    6054   6.565 0.0104 *  
timepoint:att_cond             1       2       2   0.003 0.9587    
timepoint:gng_cond             1    6086    6086   6.601 0.0102 *  
att_cond:gng_cond              1    3316    3316   3.596 0.0580 .  
timepoint:att_cond:gng_cond    1    3212    3212   3.484 0.0621 .  
Residuals                   3232 2980166     922                   
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So far so good, the models yield the exact same output (except for rounding deviation).

However, the matter becomes more complicated when I fit a model with a full random error structure, that is, a lmer model corresponding to the following aov specification:

aov2 <- aov(rating ~ timepoint*att_cond*gng_cond + Error(subjectID/(timepoint*att_cond*gng_cond)),data=d_aggregated)
summary(aov2)

Error: subjectID
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 40 930832   23271               

Error: subjectID:timepoint
            Df Sum Sq Mean Sq F value   Pr(>F)    
timepoint  1 157322  157322   46.41 3.43e-08 ***
  Residuals 40 135601    3390                     
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: subjectID:att_cond
Df Sum Sq Mean Sq F value Pr(>F)
att_cond   1     58    58.5   0.125  0.725
Residuals 40  18675   466.9               

Error: subjectID:gng_cond
            Df Sum Sq Mean Sq F value Pr(>F)   
gng_cond   1   6054    6054    10.6 0.0023 **
  Residuals 40  22834     571                  
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: subjectID:timepoint:att_cond
Df Sum Sq Mean Sq F value Pr(>F)
timepoint:att_cond  1      2     2.5   0.006  0.941
Residuals          40  17716   442.9               

Error: subjectID:timepoint:gng_cond
                     Df Sum Sq Mean Sq F value  Pr(>F)   
timepoint:gng_cond  1   6086    6086   11.39 0.00165 **
  Residuals          40  21377     534                   
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: subjectID:att_cond:gng_cond
                    Df Sum Sq Mean Sq F value Pr(>F)  
att_cond:gng_cond  1   3316    3316   6.328  0.016 *
  Residuals         40  20963     524                 
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: subjectID:timepoint:att_cond:gng_cond
                              Df Sum Sq Mean Sq F value Pr(>F)  
timepoint:att_cond:gng_cond  1   3212    3212   6.446 0.0151 *
  Residuals                   40  19934     498                 
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within
           Df  Sum Sq Mean Sq F value Pr(>F)
Residuals 2952 2723066   922.4   

In my account the right lmer representation of this model would be:

lmer.2 <- lmer(rating ~ timepoint*att_cond*gng_cond + (1 
+ timepoint*att_cond*gng_cond| subjectID), data = d)

or after reading THIS answer to another post by Jake Westfall.

lmer.3 <- lmer(rating1 ~ timepoint*att_cond*gng_cond +(1|subjectID) + (0 + timepoint | subjectID) + (0 + att_cond | subjectID)
+ (0+gng_cond |subjectID) + (0+timepoint:att_cond|subjectID) +
(0+timepoint:gng_cond|subjectID) + (0 +
att_cond:gng_cond|subjectID), d,
control=lmerControl(optCtrl=list(maxfun=1e9)))

giving:

Random effects:
Groups      Name                     Variance  Std.Dev.  Corr             
subjectID   (Intercept)              1.065e-07 3.263e-04                  
subjectID.1 timepoint0               0.000e+00 0.000e+00                  
timepoint1                           5.772e-09 7.598e-05  NaN             
subjectID.2 att_cond0                0.000e+00 0.000e+00                  
att_cond1                            6.738e-12 2.596e-06  NaN             
subjectID.3 gng_condno-go            0.000e+00 0.000e+00                  
gng_condgo                           2.876e-11 5.363e-06  NaN             
subjectID.4 timepoint0:att_cond0     1.028e+01 3.206e+00                  
timepoint1:att_cond0                 3.109e+01 5.576e+00 -1.00            
timepoint0:att_cond1                 9.653e+00 3.107e+00  1.00 -1.00      
timepoint1:att_cond1                 9.413e+01 9.702e+00 -1.00  1.00 -1.00
subjectID.5 timepoint0:gng_condno-go 8.150e+01 9.028e+00                  
timepoint1:gng_condno-go             8.642e+01 9.296e+00 1.00             
timepoint0:gng_condgo                8.465e+01 9.200e+00 1.00  1.00       
timepoint1:gng_condgo                2.007e+02 1.417e+01 1.00  1.00  1.00 
subjectID.6 att_cond0:gng_condno-go  1.672e+02 1.293e+01                  
att_cond1:gng_condno-go              1.266e+02 1.125e+01 1.00             
att_cond0:gng_condgo                 1.406e+02 1.186e+01 1.00  1.00       
att_cond1:gng_condgo                 1.840e+02 1.356e+01 1.00  1.00  1.00 
Residual                             8.850e+02 2.975e+01                  
Number of obs: 3280, groups:  subjectID, 41

Fixed effects:
Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)                     140.88293    2.60631   43.00000  54.054  < 2e-16 ***
timepoint1                        6.92561    1.01104   41.00000   6.850 2.75e-08 ***
att_cond1                         0.13354    0.54501  190.00000   0.245   0.8067    
gng_cond1                        -1.35854    0.55763  114.00000  -2.436   0.0164 *  
timepoint1:att_cond1             -0.02744    0.54272  207.00000  -0.051   0.9597    
timepoint1:gng_cond1              1.36220    0.55089  120.00000   2.473   0.0148 *  
att_cond1:gng_cond1               1.00549    0.53601  250.00000   1.876   0.0618 .  
timepoint1:att_cond1:gng_cond1   -0.98963    0.51944 3171.00000  -1.905   0.0568 .  
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, as mentioned in both cases I get vastly different results for aov vs. lmer.

My questions regarding this rather long post would be:

1) Why and how do the estimates differ?

2) Is there any situation in which a RM-anova might actually be preferable, as it yields significant effects in this case while the mixed-model does not, or is this due to mis-specification or Type-1 error?

3) Does the model I was using previously, lmer.2 <- lmer(rating ~ timepoint*att_cond*gng_cond + (1 + timepoint*att_cond*gng_cond| subjectID), data = d), make sense given the data, or should the random structure not include the 3-way interaction (i.e. lmer.3 above).

EDIT

after feeback from user amoeba, I coded the contrasts manually following advice from Jake Westfall in the post linked above and included a 3-way random term in lmer.2 and lmer.3, i.e:

lmer.2 <- lmer(rating ~ timepoint_n*att_cond_n*gng_cond_n + (1 +  
timepoint_n*att_cond_n*gng_cond_n || subjectID), data = d)

and

lmer.3 <- lmer(rating ~ timepoint_n*att_cond_n*gng_cond_n +(1|subjectID) + 
(0 + timepoint_n | subjectID) + (0 + att_cond_n | subjectID) + (0+gng_cond_n 
|subjectID) +  (0+timepoint_n:att_cond_n|subjectID) +
(0+timepoint_n:gng_cond_n|subjectID) + 
(0 + att_cond_n:gng_cond_n|subjectID) +                                         
(0 + att_cond_n:gng_cond_n:timepoint_n|subjectID), data = d)

with manual coding of numeric -1 and 1 for each factor. The outcome of both
model specifications for lmer is identical so i only provide lmer.3. However, it is still quite different from what I get from aov (see above).

> summary(lmer.3)
Linear mixed model fit by REML 
t-tests use  Satterthwaite approximations to degrees of freedom ['lmerMod']
Formula: rating ~ timepoint_n * att_cond_n * gng_cond_n + (1 | subjectID) +  
  (0 + timepoint_n | subjectID) + (0 + att_cond_n | subjectID) +  
  (0 + gng_cond_n | subjectID) + (0 + timepoint_n:att_cond_n |  
                                    subjectID) + (0 + timepoint_n:gng_cond_n | subjectID) + (0 +  
                                                                                               att_cond_n:gng_cond_n | subjectID) + (0 + att_cond_n:gng_cond_n:timepoint_n |      subjectID)
Data: d
Control: lmerControl(optCtrl = list(maxfun = 1e+09))

REML criterion at convergence: 31759.4

Scaled residuals: 
  Min      1Q  Median      3Q     Max 
-4.0614 -0.5976 -0.0071  0.6375  3.8063 

Random effects:
  Groups      Name                              Variance Std.Dev.
subjectID   (Intercept)                       279.75   16.726  
subjectID.1 timepoint_n                        31.24    5.589  
subjectID.2 att_cond_n                          0.00    0.000  
subjectID.3 gng_cond_n                          0.00    0.000  
subjectID.4 timepoint_n:att_cond_n              0.00    0.000  
subjectID.5 timepoint_n:gng_cond_n              0.00    0.000  
subjectID.6 att_cond_n:gng_cond_n               0.00    0.000  
subjectID.7 att_cond_n:gng_cond_n:timepoint_n   0.00    0.000  
Residual                                      891.15   29.852  
Number of obs: 3280, groups:  subjectID, 41

Fixed effects:
  Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)                        140.88293    2.66360   40.00000  52.892  < 2e-16 ***
timepoint_n                         -6.92561    1.01664   40.00000  -6.812 3.43e-08 ***
att_cond_n                          -0.13354    0.52124 3192.00000  -0.256  0.79782    
gng_cond_n                           1.35854    0.52124 3192.00000   2.606  0.00919 ** 
timepoint_n:att_cond_n              -0.02744    0.52124 3192.00000  -0.053  0.95802    
timepoint_n:gng_cond_n               1.36220    0.52124 3192.00000   2.613  0.00901 ** 
att_cond_n:gng_cond_n                1.00549    0.52124 3192.00000   1.929  0.05382 .  
timepoint_n:att_cond_n:gng_cond_n    0.98963    0.52124 3192.00000   1.899  0.05771 .  
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
  (Intr) tmpnt_ att_c_ gng_c_ tmpnt_n:t__ tmpnt_n:g__ a__:__
timepoint_n 0.000                                                     
att_cond_n  0.000  0.000                                              
gng_cond_n  0.000  0.000  0.000                                       
tmpnt_n:t__ 0.000  0.000  0.000  0.000                                
tmpnt_n:g__ 0.000  0.000  0.000  0.000  0.000                         
att_cnd_:__ 0.000  0.000  0.000  0.000  0.000       0.000             
tmpn_:__:__ 0.000  0.000  0.000  0.000  0.000       0.000       0.000  

Best Answer

Using Jake Westfall's EMSfunction documented HERE, I estimated the variance components from the mixed model. The random variances above that were estimated to be 0 are indeed negative as derived by EMS.

cbind(rev(ans[c(grep("e",names(ans)),grep("s",names(ans)))])/
    c(1,2,2,2,4,4,4,5,1))


s        69.838600
a:s       7.711213
b:s      -1.423696
c:s      -1.098723
a:b:s    -1.498620
a:c:s    -1.212590
b:c:s    -1.244919
a:b:c:s  -2.650683
e       922.447967

When running lmer.4 on the aggregated data-set, lmergives identical results (except for the 3-way interaction, but it is still very similar).

> lmer.4 <- lmer(rating ~ timepoint_n*att_cond_n*gng_cond_n +(1|subjectID) + (0 + timepoint_n | subjectID) 
                                  (0 + att_cond_n | subjectID) + (0+gng_cond_n |subjectID) +  (0+timepoint_n:att_cond_n|subjectID) + 
                                  (0+timepoint_n:gng_cond_n|subjectID) + (0 + att_cond_n:gng_cond_n|subjectID), 
                                 data = d_aggr, control=lmerControl(optCtrl=list(maxfun=1e9)))
> summary(lmer.4)
Linear mixed model fit by REML 
t-tests use  Satterthwaite approximations to degrees of freedom ['lmerMod']
Formula: rating ~ timepoint_n * att_cond_n * gng_cond_n + (1 | subjectID) +  
  (0 + timepoint_n | subjectID) + (0 + att_cond_n | subjectID) +  
  (0 + gng_cond_n | subjectID) + (0 + timepoint_n:att_cond_n |  
                                    subjectID) + (0 + timepoint_n:gng_cond_n | subjectID) + (0 +      att_cond_n:gng_cond_n | subjectID)
Data: d_aggr
Control: lmerControl(optCtrl = list(maxfun = 1e+09))

REML criterion at convergence: 2438.7

Scaled residuals: 
  Min      1Q  Median      3Q     Max 
-4.0726 -0.1788 -0.0138  0.2457  4.3407 

Random effects:
  Groups      Name                   Variance  Std.Dev. 
subjectID   (Intercept)            2.850e+02 1.688e+01
subjectID.1 timepoint_n            3.651e+01 6.042e+00
subjectID.2 att_cond_n             2.257e-11 4.750e-06
subjectID.3 gng_cond_n             1.269e+00 1.126e+00
subjectID.4 timepoint_n:att_cond_n 0.000e+00 0.000e+00
subjectID.5 timepoint_n:gng_cond_n 8.132e-01 9.018e-01
subjectID.6 att_cond_n:gng_cond_n  6.839e-01 8.270e-01
Residual                           4.694e+01 6.851e+00
Number of obs: 328, groups:  subjectID, 41

Fixed effects:
                                   Estimate   Std. Error      df  t value Pr(>|t|)    
(Intercept)                       140.88293    2.66360  40.00000  52.892  < 2e-16 ***
timepoint_n                        -6.92561    1.01664  40.00000  -6.812 3.43e-08 ***
att_cond_n                         -0.13354    0.37828 120.00000  -0.353  0.72470    
gng_cond_n                          1.35854    0.41718  40.00000   3.256  0.00230 ** 
timepoint_n:att_cond_n             -0.02744    0.37828 120.00000  -0.073  0.94230    
timepoint_n:gng_cond_n              1.36220    0.40365  40.00000   3.375  0.00165 ** 
att_cond_n:gng_cond_n               1.00549    0.39972  40.00000   2.515  0.01601 *  
timepoint_n:att_cond_n:gng_cond_n   0.98963    0.37828 120.00000   2.616  0.01004 *  
  ---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
(Intr) tmpnt_ att_c_ gng_c_ tmpnt_n:t__ tmpnt_n:g__ a__:__
timepoint_n 0.000                                                     
att_cond_n  0.000  0.000                                              
gng_cond_n  0.000  0.000  0.000                                       
tmpnt_n:t__ 0.000  0.000  0.000  0.000                                
tmpnt_n:g__ 0.000  0.000  0.000  0.000  0.000                         
att_cnd_:__ 0.000  0.000  0.000  0.000  0.000       0.000             
tmpn_:__:__ 0.000  0.000  0.000  0.000  0.000       0.000       0.000 

Therefore, it seems that as lmer cannot achieve the fit of aov by not being able/allowed to estimate negative variance components, the results differ, while the difference is gone when running the lmer representation of the model on the aggregated data, eliminating the negatively estimated variance components.

Related Question