Update 3 (May, 2013): Another really good paper on mixed models in Psychology was released in the Journal of Memory and Language (although I do not agree with the authors conclusions on how to obtain p-values, see package afex
instead). It very nicely discusses on how to specify the random effects structure. Go read it!
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. doi:10.1016/j.jml.2012.11.001
Update 2 (July, 2012): A paper advocating the use in (Social) Psychology when there are crossed (e.g., participants and items) random effects.
The big thing is: It shows how to obtain p-values using the pbkrtest package:
Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54–69. doi:10.1037/a0028347
(only available as a Word .doc)
Jake Westfall told me (per mail) that an alternative for obtaining p-values to the advocated Kenward-Rogers approximation (used in pbkrtest) is the (less optimal) Satterthwaite approximation, which can be found in the MixMod package using the anovaTab
function.
Small update to last update: My R package afex
contains function mixed()
to conveniently obtain p-values for all effects in a mixed model. Alternatively, the car
package now also obtains p-values for mixed models in Anova()
using test.statistic = "F"
UPDATE1: Another paper describing lme4
Kliegl, R., Wei, P., Dambacher, M., Yan, M., & Zhou, X. (2011). Experimental effects and individual differences in linear mixed models: estimating the relationship between spatial, object, and attraction effects in visual attention. Frontiers in Quantitative Psychology and Measurement, 1, 238. doi:10.3389/fpsyg.2010.00238
Original Response:
I do not have a number of examples, only one (see below), but know some paper you should cite from Psychology/Cognitive Sciences. The most important one is definitely:
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. doi:10.1016/j.jml.2007.12.005
Another one from Baayen is:
Baayen, R. H., & Milin, P. (2010). Analyzing Reaction Times. International Journal of Psychological Research, 3(2), 12–28.
I actually totally liked his book, too, which also has a nice introductory chapter on mixed model (and is pretty cheap for a stats book):
Baayen, R. H. (2008). Analyzing linguistic data : a practical introduction to statistics using R. Cambridge, UK; New York: Cambridge University Press.
I probably guess he also has a lot of papers using lme4
, but as my main interest is not psycholinguistics, you might wanna check his homepage.
From my field (reasoning), I know of this one paper that uses lme4
:
Fugard, A. J. B., Pfeifer, N., Mayerhofer, B., & Kleiter, G. D. (2011). How people interpret conditionals: Shifts toward the conditional event. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(3), 635–648. doi:10.1037/a0022329
(although I have the feeling they use a likelihood ratio test to compare models which only differ in the fixed parameters, which I have heard is not the correct way. I think you should use AIC instead.)
I think that lmerTest
is getting it right and ezanova
is getting it wrong in this case.
- the results from
lmerTest
agree with my intuition/understanding
- two different computations in
lmerTest
(Satterthwaite and Kenward-Roger) agree
- they also agree with
nlme::lme
- when I run it,
ezanova
gives a warning, which I don't entirely understand, but which should not be disregarded ...
Re-running example:
library(ez); library(lmerTest); library(nlme)
data(ANT)
ANT.2 <- subset(ANT, !error)
set.seed(101) ## for reproducibility
baseline.shift <- rnorm(length(unique(ANT.2$subnum)), 0, 50)
ANT.2$rt <- ANT.2$rt + baseline.shift[as.numeric(ANT.2$subnum)]
Figure out experimental design
with(ANT.2,table(subnum,group,direction))
So it looks like individuals (subnum
) are placed in either control or treatment groups, and each is tested for both directions -- i.e. direction can be tested within individuals (denominator df is large), but group and group:direction can only be tested among individuals
(anova.ez <- ezANOVA(data = ANT.2, dv = .(rt), wid = .(subnum),
within = .(direction), between = .(group)))
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 group 1 18 2.4290721 0.13651174 0.1183150147
## 3 direction 1 18 0.9160571 0.35119193 0.0002852171
## 4 group:direction 1 18 4.9169156 0.03970473 * 0.0015289914
Here I get Warning: collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate.
The denominator DF look a little funky (all equal to 18): I think they should be larger for direction and group:direction, which can be tested independently (but would be smaller if you added (direction|subnum)
to the model)?
# similarly with lmer and lmerTest::anova
model <- lmer(rt ~ group * direction + (1 | subnum), data = ANT.2)
lmerTest::anova(model)
## Df Sum Sq Mean Sq F value Denom Pr(>F)
## group 1 12065.7 12065.7 2.4310 18 0.1364
## direction 1 1952.2 1952.2 0.3948 5169 0.5298
## group:direction 1 11552.2 11552.2 2.3299 5169 0.1270
the Df
column here refers to the numerator df, Denom
(second-to-last) gives the estimated denominator df; they agree with the classical intuition. More important, we also get different answers for the F values ...
We can also double-check with Kenward-Roger (very slow because it involves refitting the model several times)
lmerTest::anova(model,ddf="Kenward-Roger")
The results are identical.
For this example lme
(from the nlme
package) actually does a perfectly good job guessing the appropriate denominator df (the F and p-values are very slightly different):
model3 <- lme(rt ~ group * direction, random=~1|subnum, data = ANT.2)
anova(model3)[-1,]
## numDF denDF F-value p-value
## group 1 18 2.4334314 0.1362
## direction 1 5169 0.3937316 0.5304
## group:direction 1 5169 2.3298847 0.1270
If I fit an interaction between direction
and subnum
the df for direction
and group:direction
are much smaller (I would have thought they would be 18, but maybe I'm getting something wrong):
model2 <- lmer(rt ~ group * direction + (direction | subnum), data = ANT.2)
lmerTest::anova(model2)
## Df Sum Sq Mean Sq F value Denom Pr(>F)
## group 1 20334.7 20334.7 2.4302 17.995 0.1364
## direction 1 1804.3 1804.3 0.3649 124.784 0.5469
## group:direction 1 10616.6 10616.6 2.1418 124.784 0.1459
Best Answer
Unless I'm reading you incorrectly you would have approximately this many degrees of freedom in a standard repeated measures ANOVA. The ANOVA error is the interaction between subjects and the effect and requires degrees of freedom from both. However, if you are making multiple measures at each time interval then yes, the degrees for freedom are much higher for the mixed effects model.
It's not the standard pseudoreplication problem you'd have if you were doing a repeated measures ANOVA. With the ANOVA you are only modelling the effects in question and should aggregate your data to get better estimates of each effect for each S. With mixed effects modelling you are potentially modelling each data point, even those pseudoreplicated measurements you take for accuracies sake. You can do that because you're explicitly saying these individual measures are grouped within subjects and within this factor, etc. Therefore, you can often have more degrees of freedom. Although, as I stated before, it doesn't sound like that's the case here anyway.