R Mixed Models – Equivalence Between Repeated Measures ANOVA and Mixed Model: lmer vs lme

anovalme4-nlmemixed modelrrepeated measures

I have some trouble obtaining equivalent results between an aov between-within repeated measures model and an lmer mixed model.

My data and script look as follows

data=read.csv("https://www.dropbox.com/s/zgle45tpyv5t781/fitness.csv?dl=1")
data$id=factor(data$id)
data
   id  FITNESS      TEST PULSE
1   1  pilates   CYCLING    91
2   2  pilates   CYCLING    82
3   3  pilates   CYCLING    65
4   4  pilates   CYCLING    90
5   5  pilates   CYCLING    79
6   6  pilates   CYCLING    84
7   7 aerobics   CYCLING    84
8   8 aerobics   CYCLING    77
9   9 aerobics   CYCLING    71
10 10 aerobics   CYCLING    91
11 11 aerobics   CYCLING    72
12 12 aerobics   CYCLING    93
13 13    zumba   CYCLING    63
14 14    zumba   CYCLING    87
15 15    zumba   CYCLING    67
16 16    zumba   CYCLING    98
17 17    zumba   CYCLING    63
18 18    zumba   CYCLING    72
19  1  pilates   JOGGING   136
20  2  pilates   JOGGING   119
21  3  pilates   JOGGING   126
22  4  pilates   JOGGING   108
23  5  pilates   JOGGING   122
24  6  pilates   JOGGING   101
25  7 aerobics   JOGGING   116
26  8 aerobics   JOGGING   142
27  9 aerobics   JOGGING   137
28 10 aerobics   JOGGING   134
29 11 aerobics   JOGGING   131
30 12 aerobics   JOGGING   120
31 13    zumba   JOGGING    99
32 14    zumba   JOGGING    99
33 15    zumba   JOGGING    98
34 16    zumba   JOGGING    99
35 17    zumba   JOGGING    87
36 18    zumba   JOGGING    89
37  1  pilates SPRINTING   179
38  2  pilates SPRINTING   195
39  3  pilates SPRINTING   188
40  4  pilates SPRINTING   189
41  5  pilates SPRINTING   173
42  6  pilates SPRINTING   193
43  7 aerobics SPRINTING   184
44  8 aerobics SPRINTING   179
45  9 aerobics SPRINTING   179
46 10 aerobics SPRINTING   174
47 11 aerobics SPRINTING   164
48 12 aerobics SPRINTING   182
49 13    zumba SPRINTING   111
50 14    zumba SPRINTING   103
51 15    zumba SPRINTING   113
52 16    zumba SPRINTING   118
53 17    zumba SPRINTING   127
54 18    zumba SPRINTING   113

Basically, 3 x 6 subjects (id) were subjected to three different FITNESS workout schemes each and their PULSE was measured after carrying out three different types of endurance TESTs.

I then fitted the following aov model :

library(afex)
library(car)
set_sum_contrasts()
fit1 = aov(PULSE ~ FITNESS*TEST + Error(id/TEST),data=data)
summary(fit1)
Error: id
          Df Sum Sq Mean Sq F value   Pr(>F)    
FITNESS    2  14194    7097   115.1 7.92e-10 ***
Residuals 15    925      62                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:TEST
             Df Sum Sq Mean Sq F value   Pr(>F)    
TEST          2  57459   28729   253.7  < 2e-16 ***
FITNESS:TEST  4   8200    2050    18.1 1.16e-07 ***
Residuals    30   3397     113                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The result I obtain using

set_sum_contrasts()
fit2=aov.car(PULSE ~ FITNESS*TEST+Error(id/TEST),data=data,type=3,return="Anova")
summary(fit2)

is identical to this.

A mixed model run using nlme gives a directly equivalent result, e.g. using lme :

library(lmerTest)    
lme1=lme(PULSE ~ FITNESS*TEST, random=~1|id, correlation=corCompSymm(form=~1|id),data=data)
anova(lme1)
             numDF denDF   F-value p-value
(Intercept)      1    30 12136.126  <.0001
FITNESS          2    15   115.127  <.0001
TEST             2    30   253.694  <.0001
FITNESS:TEST     4    30    18.103  <.0001


summary(lme1)
Linear mixed-effects model fit by REML
 Data: data 
       AIC      BIC    logLik
  371.5375 393.2175 -173.7688

Random effects:
 Formula: ~1 | id
        (Intercept) Residual
StdDev:    1.699959 9.651662

Correlation Structure: Compound symmetry
 Formula: ~1 | id 
 Parameter estimate(s):
       Rho 
-0.2156615 
Fixed effects: PULSE ~ FITNESS * TEST 
                                 Value Std.Error DF   t-value p-value
(Intercept)                   81.33333  4.000926 30 20.328628  0.0000
FITNESSpilates                 0.50000  5.658164 15  0.088368  0.9308
FITNESSzumba                  -6.33333  5.658164 15 -1.119327  0.2806
TESTJOGGING                   48.66667  6.143952 30  7.921069  0.0000
TESTSPRINTING                 95.66667  6.143952 30 15.570868  0.0000
FITNESSpilates:TESTJOGGING   -11.83333  8.688861 30 -1.361897  0.1834
FITNESSzumba:TESTJOGGING     -28.50000  8.688861 30 -3.280062  0.0026
FITNESSpilates:TESTSPRINTING   8.66667  8.688861 30  0.997446  0.3265
FITNESSzumba:TESTSPRINTING   -56.50000  8.688861 30 -6.502579  0.0000

Or using gls :

library(lmerTest)    
gls1=gls(PULSE ~ FITNESS*TEST, correlation=corCompSymm(form=~1|id),data=data)
anova(gls1)

However, the result I obtain using lme4's lmer is different :

set_sum_contrasts()
fit3=lmer(PULSE ~ FITNESS*TEST+(1|id),data=data)
summary(fit3)
Linear mixed model fit by REML ['lmerMod']
Formula: PULSE ~ FITNESS * TEST + (1 | id)
   Data: data

REML criterion at convergence: 362.4

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept)  0.00    0.0     
 Residual             96.04    9.8     
...

Anova(fit3,test.statistic="F",type=3)
Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)

Response: PULSE
                    F Df Df.res    Pr(>F)    
(Intercept)  7789.360  1     15 < 2.2e-16 ***
FITNESS        73.892  2     15 1.712e-08 ***
TEST          299.127  2     30 < 2.2e-16 ***
FITNESS:TEST   21.345  4     30 2.030e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Anybody any thoughts what I am doing wrong with the lmer model? Or where the difference comes from? Could it have to do anything with lmer not allowing negative intraclass corellations or something like that? Given that nlme's gls and lme do return the correct result, though, I am wondering how this is different in gls and lme? Is it that the option correlation=corCompSymm(form=~1|id) causes them to directly estimate the intraclass correlation, which can be either positive or negative, whereas lmer estimates a variance component, which cannot be negative (and ends up being estimated as zero in this case)?

Best Answer

As Ben Bolker already mentioned in the comments, the problem is as you suspect: The lmer() model gets tripped up because it attempts to estimate a variance component model, with the variance component estimates constrained to be non-negative. What I will try to do is give a somewhat intuitive understanding of what it is about your dataset that leads to this, and why this causes a problem for variance component models.

Here is a plot of your dataset. The white dots are the actual observations and the black dots are the subject means.

enter image description here

To make things more simple, but without changing the spirit of the problem, I will subtract out the fixed effects (i.e., the FITNESS and TEST effects, as well as the grand mean) and deal with the residual data as a one-way random effects problem. So here's what the new dataset looks like:

enter image description here

Look hard at the patterns in this plot. Think about how observations taken from the same subject differ from observations taken from different subjects. Specifically, notice the following pattern: As one of the observations for a subject is higher (or lower) above (or below) the subject mean, the other observations from that subject tend to be on the opposite side of the subject mean. And the further that observation is from the subject mean, the further the other observations tend to be from the subject mean on the opposite side. This indicates a negative intra-class correlation. Two observations taken from the same subject actually tend to be less similar, on average, than two observations drawn purely at random from the dataset.

Another way to think about this pattern is in terms of the relative magnitudes of the between-subject and within-subject variance. It appears that there is substantially greater within-subject variance compared to between-subject variance. Of course, we expect this to happen to some extent. After all, the within-subject variance is based on variation in the individual data points, while the between-subject variance is based on variation in means of the individual data points (i.e., the subject means), and we know that the variance of a mean will tend to decrease as the number of things being averaged increases. But in this dataset the difference is quite striking: There is way more within-subject than between-subject variation. Actually this difference is exactly the reason why a negative intra-class correlation emerges.

Okay, so here is the problem. The variance component model assumes that each data point is the sum of a subject effect and an error: $y_{ij}=u_j+e_{ij}$, where $u_j$ is the effect of the $j$th subject. So let's think about what would happen if there were truly 0 variance in the subject effects -- in other words, if the true between-subjects variance component were 0. Given an actual dataset generated under this model, if we were to compute sample means for each subject's observed data, those sample means would still have some non-zero variance, but they would reflect only error variance, and not any "true" subject variance (because we have assumed there is none).

So how variable would we expect these subject means to be? Well, basically each estimated subject effect is a mean, and we know the formula for the variance of a mean: $\text{var}(\bar{X})=\text{var}(X_i)/n$, where $n$ is the number of things being averaged. Now let's apply this formula to your dataset and see how much variance we would expect to see in the estimated subject effects if the true between-subjects variance component were exactly 0.

The within-subject variance works out to be $348$, and each subject effect is computed as the mean of 3 observations. So the expected standard deviation in the subject means -- assuming the true between-subject variance is 0 -- works out to be about $10.8$. Now compare this to the standard deviation in subject means that we actually observed: $4.3$! The observed variation is substantially less than the expected variation when we assumed 0 between-subject variance. For a variance component model, the only way that the observed variation could be expected to be as low as what we actually observed is if the true between-subject variance were somehow negative. And therein lies the problem. The data imply that there is somehow a negative variance component, but the software (sensibly) will not allow negative estimates of variance components, since a variance can in fact never be negative. The other models that you fit avoid this problem by directly estimating the intra-class correlation rather than assuming a simple variance component model.

If you want to see how you could actually get the negative variance component estimate implied by your dataset, you can use the procedure that I illustrate (with accompanying R code) in this other recent answer of mine. That procedure is not totally trivial, but not too hard either (for a balanced design such as this one).