Solved – How are the degrees of freedom in the emmeans package calculated? – R

lme4-nlmelsmeanspost-hocrt-test

Let us look at some sample data for 5 hypothetical subjects.

library(emmeans)
library(lme4)

# generate some sample data
# condition (Placebo, Treatment)
# type (some factor, e.g. two different medications)
# value (outcome of interest)
# subject

data <- data.frame(condition = as.factor(rep(c(0,1), times = 10)), value = rnorm(20), type = rep(c("A","B"), each = 10), subject = as.factor(rep(rep(1:5, each = 2), times = 2)))

Now we like to conduct a mixed model and post-hoc tests with emmeans, where we test conditions per type in a pairwise fashion:

# caluculate a mixed model
fit_data <- lmer(value ~ condition*type + (1|subject), data = data)

# emmeans post-hoc test
emmeans(fit_data, pairwise ~ condition | type)

So for each type (A or B), I have paired data (condition 0 has 5 data points and condition 1 has 5 data points). If I would do a paired t-test like so:

# paired t-test
t.test(data[data$condition==0 & data$type=="A","value"], data[data$condition==1 & data$type=="A","value"], paired = TRUE)
t.test(data[data$condition==0 & data$type=="B","value"], data[data$condition==1 & data$type=="B","value"], paired = TRUE)

I would get degrees of freedom of 4 for the paired t-test, but emmeans says the degrees of freedom are 12. Why is there this huge difference? If the emmeans package also would use df = 4, then the p-values would also be more comparable.

Best Answer

Your t.test() results are each based on only selected portions of the data, whereas the emmeans() results are based on a model that is fitted to all of the data. Consequently, the denominators for the $t$ statistics are based on pooled information in the model. More information, more df.

More details

Let me create your data in a reproducible way, by providing the random-number seed:

set.seed(12345)
data <- data.frame(condition = as.factor(rep(c(0,1), times = 10)), 
    value = rnorm(20), type = rep(c("A","B"), each = 10), 
    subject = as.factor(rep(rep(1:5, each = 2), times = 2)))

Here are your t-test results:

> t.test(data[data$condition==0 & data$type=="A","value"], 
+        data[data$condition==1 & data$type=="A","value"], paired = TRUE)

    Paired t-test

data:  data[data$condition == 0 & data$type == "A", "value"] and 
       data[data$condition == 1 & data$type == "A", "value"]
t = 1.9384, df = 4, p-value = 0.1246
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3618957  2.0361137
sample estimates:
mean of the differences 
               0.837109

Now we compare with emmeans() results. The important thing to know about emmeans() is that it provides an interpretation of a fitted model, not of the dataset itself. If you give it a different fitted model, you will get different results. Here is what we get with your model:

> library(lme4)
> mod1 = lmer(value ~ condition*type + (1|subject), data = data)
> pairs(emmeans(mod1, ~ condition | type))

type = A:
 contrast estimate    SE df t.ratio p.value
 0 - 1       0.837 0.475 12  1.764  0.1032 

type = B:
 contrast estimate    SE df t.ratio p.value
 0 - 1      -0.677 0.475 12 -1.426  0.1795

Each t test shown indeed has 12 d.f., and that is because of information extracted from the model provided it, mod1. Now consider a different model: one that is fitted only to the data for type A:

> mod2 = lmer(value ~ condition + (1|subject), data = data, 
+             subset = (type == "A"))

This model does not include type as a predictor, so of course we have to alter the emmeans() call accordingly:

> pairs(emmeans(mod2, ~ condition))
 contrast estimate    SE df t.ratio p.value
 0 - 1       0.837 0.432  4 1.938   0.1246

With model mod2, we get only 4 d.f. And, in fact, the t test and P value are identical to what was obtained above using t.test().

Again, the point is that emmeans() summarizes a fitted model. The model mod1 uses data from both types, and it incorporates an assumption that the error variance is the same with each type, and that the subject variance is the same with each type. It pools the information from the two types to obtain more accurate* estimates of these variances, and that is reflected in the increased degrees of freedom. [* more accurate, assuming the model assumptions are correct!]

Because of this, if you are satisfied, based on diagnostic plots, etc., that mod1 fits the data adequately, then the emmeans() results from mod1 are better than those from mod2 because more efficient use is made of available information. If, on the other hand, mod1 does not fit well, mod2 may be better and hence you should give that model to emmeans().

Related Solutions

Solved – F stats for post hoc test of a linear mixed effects model

If you use the lsmeans function directly rather than via glht, you might have better luck.

library(lsmeans)
( mod.pairs = contrast(lsmeans(model_final, ~ cond|time), "trt.vs.ctrl1") )

At this point, what you have is pairwise comparisons at each time. Is it the cond:time interaction you want to test? If so, compute contrasts of the above differences (column labeled contrast):

inter.con = contrast(mod.pairs, "trt.vs.ctrl1", by = NULL)
test(inter.con, joint = TRUE)

I'm suggesting using trt.vs.ctrl1 contrasts because it makes the right number of contrasts to match the d.f. However if you use pairwise instead, it will give the same results (it will just have to work harder).

PS this does require the latest version of lsmeans (2.12) because the joint option was not avaialble before then.

Post-Hoc Tests in LMER Models – Huge Degrees of Freedom and Emmeans Analysis

First, P value adjustments are done separately for each "by" group, and you have only one comparison in each group, hence no multiplicity of tests, hence no multiplicity adjustments are needed nor done. If you do summary(LMM.emmeans$contrasts, by = NULL, adjust = "bonf"), you will see an adjustment for the two comparisons considered as one family of tests.

Second, when you have a within-subjects comparison, the subject effects cancel out because they are on the same subject. That means the degrees of freedom needed to estimate the subject variations do no play a role, and that makes the d.f. for the comparison a lot greater than the d.f. for the means themselves. It will not exceed the number of observations in the dataset, however.

Best Answer

More details

Related Solutions

Solved – F stats for post hoc test of a linear mixed effects model

Post-Hoc Tests in LMER Models – Huge Degrees of Freedom and Emmeans Analysis

Related Question