Solved – Lme4 and lmertest: Different degrees of freedom from same dataset

lme4-nlmer

I just encountered a problem while analyzing experimental data using lme4 and lmertest. In the experiment, 67 subjects gave 3 ratings for 50 stimuli shown for 3 different durations (a total of 10050 responses). I used the same nested model for each of the 3 ratings (the only difference being the response variable), but the denominator dfs are different for each of the 3 lmertest anovas (1808, 9848, 1807). How is this possible?

A similar question was asked fro mixed effects in SPSS, but remained unanswered. Different degrees of freedom when using the same mixed effect model (SPSS)

Edit: Looking more closely at the model using summary(), I found that one group of the analysis that yielded much higher dfs (9848) has a variance of 0. Might this be the reason (the factor cd is nested within the duration factor named dur)?

Low df model:

Random effects:
Groups        Name        Variance Std.Dev.
cd:(dur:subj) (Intercept) 0.13794  0.3714  
dur:subj      (Intercept) 0.04408  0.2099  
subj          (Intercept) 0.41425  0.6436  
Residual                  1.58692  1.2597  
Number of obs: 10050, groups:  cd:(dur:subj), 2010; dur:subj, 201; subj, 67

High df model:

Random effects:
Groups        Name        Variance Std.Dev.
cd:(dur:subj) (Intercept) 0.00000  0.0000  
dur:subj      (Intercept) 0.06821  0.2612  
subj          (Intercept) 0.39013  0.6246  
Residual                  1.74595  1.3213  
Number of obs: 10050, groups:  cd:(dur:subj), 2010; dur:subj, 201; subj, 67

Best Answer

Before jumping into a (partial, non-rigorous) explanation, let me point out that all of your denominator dfs are so large that there should be no practical difference between them. For any df greater than about 50, there's little difference between the $t$ distribution and the corresponding Normal distribution, or between the $F$ distribution and the corresponding (scaled) $\chi^2$ distribution -- unless you are doing something like estimating the 99.999% confidence intervals. For example, qnorm(0.75) (the upper tail cutoff of a symmetric 95% interval for a standard Normal) is 1.959964, while qt(0.975,df=1000) is 1.962339; changing the df to 10000 will hardly affect the result. (The comparison for $F$ vs. $\chi^2$ is almost identical, but slightly more annoying to compute because of the difference in scaling.)

You can play with lme to get the classical estimate of the degrees of freedom (be aware that lme's df-calculation heuristic can fail badly for random-slope models ...)

dd <- expand.grid(subj=1:67,stim=1:50,dur=1:3)
dd$y <- rnorm(nrow(dd))
library(nlme)
anova(lme(y~1,random=~1|subj/dur/stim,data=dd))
##            numDF denDF  F-value p-value
## (Intercept)     1  9849 2.520658  0.1124

The df value here is 67*3*49 ... one important thing you haven't told us is what fixed effects you are actually testing, and at what level they vary. I would guess that they don't actually vary the lowest level, so that in the classical case you would essentially be testing over a denominator based on the subject $\times$ duration SS.

I think the fact that the lowest-level variance (cd:dur:subj) is zero in the high-df model is actually a good clue. No variance among stimuli within duration essentially means that the responses are more less independent, so you get closer to the full possible df for the model ...

It can't be a coincidence that the other df you are getting are near 67*27=1809, but I'm not sure where the 27 would come from ...

I would welcome more rigorous/better-informed answers, especially about what the df would be in the classical case ...

Related Question