Post-Hoc Tests in LMER Models – Huge Degrees of Freedom and Emmeans Analysis

degrees of freedomlme4-nlmelsmeanspost-hoct-test

I have a trial-wise linear mixed model with a categorial between-subjects factor group (A vs B) and two categorical within-subjects factors itemEmotion (neutral vs. negative) and neuralArea (frontal vs. posterior) and their interactions and random intercepts for each subject (n=52, 26 per group) and item (60 items in total, 30 neutral, 30 negative):

LMM <- lmer(neuralActivity ~ group * itemEmotion * neuralArea +
              (1 | subject) + (1| item), data = data)
summary(LMM)

Based on a significant group x neuralArea interaction I ran post-hoc tests on the difference between frontal and posterior neuralArea in each group using emmeans():

LMM.emmeans = emmeans (LMM, pairwise ~ neuralArea|group, lmer.df = "satterthwaite", 
                       lmerTest.limit=6240)
summary(LMM.emmeans)

with those results:

$emmeans
group = A:
 neuralArea  emmean      SE  df   lower.CL upper.CL
 anterior    0.00295 0.00133 119  0.0003165  0.00559
 posterior   0.00536 0.00133 119  0.0027233  0.00800

group = B:
 neuralArea  emmean      SE  df   lower.CL upper.CL
 anterior    0.00257 0.00133 119 -0.0000661  0.00521
 posterior   0.01082 0.00133 119  0.0081836  0.01346

Results are averaged over the levels of: itemEmotion 
Degrees-of-freedom method: satterthwaite 
Confidence level used: 0.95 

$contrasts
group = A:
 contrast              estimate      SE   df t.ratio p.value
 anterior - posterior  -0.00241 0.00157 6124  -1.531  0.1258

group = B:
 contrast               estimate      SE   df t.ratio p.value
 anterior - posterior   -0.00825 0.00157 6124  -5.248  <.0001

Results are averaged over the levels of: itemEmotion 
Degrees-of-freedom method: satterthwaite 

What really confuses me are the 6124 df's for the t-tests as I've never seen such high df's reported. Is such a number possible with only 60 trials and 26 participants per group or is something off here? Note, that in this example I implemented lmerTest.limit=6240 because otherwise my df's are reported to be INF and z-tests instead of t-tests are computed and that I used satterthwaite's method because the computation is much faster than with Kennward-Roger, but both methods give me the same number for the degrees of freedom in those contrasts.

Also, p-value adjustment for multiple testing does not seem to work with any contrasts that I computed for this model as I get the same p-value and no comment regarding the p-value adjustment in my results outputs for each p.adjust.methods= (e.g. "fdr", "bonferroni", "none") command that I've added either into the emmeans() or the summary() function. The adjustment does actually work using multcomp::glht(), but this gives me only z-values instead of t-values and again no df's at all. Or is a z-test actually more appropriate in this case?

However, I'm mainly interested in knowing whether those degrees of freedom of the t-test make sense?

Thank you very much in advance!

Best Answer

First, P value adjustments are done separately for each "by" group, and you have only one comparison in each group, hence no multiplicity of tests, hence no multiplicity adjustments are needed nor done. If you do summary(LMM.emmeans$contrasts, by = NULL, adjust = "bonf"), you will see an adjustment for the two comparisons considered as one family of tests.

Second, when you have a within-subjects comparison, the subject effects cancel out because they are on the same subject. That means the degrees of freedom needed to estimate the subject variations do no play a role, and that makes the d.f. for the comparison a lot greater than the d.f. for the means themselves. It will not exceed the number of observations in the dataset, however.

Related Question