I have a trial-wise linear mixed model with a categorial between-subjects factor group (A vs B) and two categorical within-subjects factors itemEmotion (neutral vs. negative) and neuralArea (frontal vs. posterior) and their interactions and random intercepts for each subject (n=52, 26 per group) and item (60 items in total, 30 neutral, 30 negative):
LMM <- lmer(neuralActivity ~ group * itemEmotion * neuralArea +
(1 | subject) + (1| item), data = data)
summary(LMM)
Based on a significant group x neuralArea interaction I ran post-hoc tests on the difference between frontal and posterior neuralArea in each group using emmeans()
:
LMM.emmeans = emmeans (LMM, pairwise ~ neuralArea|group, lmer.df = "satterthwaite",
lmerTest.limit=6240)
summary(LMM.emmeans)
with those results:
$emmeans
group = A:
neuralArea emmean SE df lower.CL upper.CL
anterior 0.00295 0.00133 119 0.0003165 0.00559
posterior 0.00536 0.00133 119 0.0027233 0.00800
group = B:
neuralArea emmean SE df lower.CL upper.CL
anterior 0.00257 0.00133 119 -0.0000661 0.00521
posterior 0.01082 0.00133 119 0.0081836 0.01346
Results are averaged over the levels of: itemEmotion
Degrees-of-freedom method: satterthwaite
Confidence level used: 0.95
$contrasts
group = A:
contrast estimate SE df t.ratio p.value
anterior - posterior -0.00241 0.00157 6124 -1.531 0.1258
group = B:
contrast estimate SE df t.ratio p.value
anterior - posterior -0.00825 0.00157 6124 -5.248 <.0001
Results are averaged over the levels of: itemEmotion
Degrees-of-freedom method: satterthwaite
What really confuses me are the 6124 df's for the t-tests as I've never seen such high df's reported. Is such a number possible with only 60 trials and 26 participants per group or is something off here? Note, that in this example I implemented lmerTest.limit=6240
because otherwise my df's are reported to be INF and z-tests instead of t-tests are computed and that I used satterthwaite's method because the computation is much faster than with Kennward-Roger, but both methods give me the same number for the degrees of freedom in those contrasts.
Also, p-value adjustment for multiple testing does not seem to work with any contrasts that I computed for this model as I get the same p-value and no comment regarding the p-value adjustment in my results outputs for each p.adjust.methods=
(e.g. "fdr", "bonferroni", "none") command that I've added either into the emmeans()
or the summary()
function. The adjustment does actually work using multcomp::glht()
, but this gives me only z-values instead of t-values and again no df's at all. Or is a z-test actually more appropriate in this case?
However, I'm mainly interested in knowing whether those degrees of freedom of the t-test make sense?
Thank you very much in advance!
Best Answer
First, P value adjustments are done separately for each "by" group, and you have only one comparison in each group, hence no multiplicity of tests, hence no multiplicity adjustments are needed nor done. If you do
summary(LMM.emmeans$contrasts, by = NULL, adjust = "bonf")
, you will see an adjustment for the two comparisons considered as one family of tests.Second, when you have a within-subjects comparison, the subject effects cancel out because they are on the same subject. That means the degrees of freedom needed to estimate the subject variations do no play a role, and that makes the d.f. for the comparison a lot greater than the d.f. for the means themselves. It will not exceed the number of observations in the dataset, however.