Solved – How are the degrees of freedom in the emmeans package calculated? – R

lme4-nlmelsmeanspost-hocrt-test

Let us look at some sample data for 5 hypothetical subjects.

library(emmeans)
library(lme4)

# generate some sample data
# condition (Placebo, Treatment)
# type (some factor, e.g. two different medications)
# value (outcome of interest)
# subject

data <- data.frame(condition = as.factor(rep(c(0,1), times = 10)), value = rnorm(20), type = rep(c("A","B"), each = 10), subject = as.factor(rep(rep(1:5, each = 2), times = 2)))

Now we like to conduct a mixed model and post-hoc tests with emmeans, where we test conditions per type in a pairwise fashion:

# caluculate a mixed model
fit_data <- lmer(value ~ condition*type + (1|subject), data = data)

# emmeans post-hoc test
emmeans(fit_data, pairwise ~ condition | type)

So for each type (A or B), I have paired data (condition 0 has 5 data points and condition 1 has 5 data points). If I would do a paired t-test like so:

# paired t-test
t.test(data[data$condition==0 & data$type=="A","value"], data[data$condition==1 & data$type=="A","value"], paired = TRUE)
t.test(data[data$condition==0 & data$type=="B","value"], data[data$condition==1 & data$type=="B","value"], paired = TRUE)

I would get degrees of freedom of 4 for the paired t-test, but emmeans says the degrees of freedom are 12. Why is there this huge difference? If the emmeans package also would use df = 4, then the p-values would also be more comparable.

Best Answer

Your t.test() results are each based on only selected portions of the data, whereas the emmeans() results are based on a model that is fitted to all of the data. Consequently, the denominators for the $t$ statistics are based on pooled information in the model. More information, more df.

More details

Let me create your data in a reproducible way, by providing the random-number seed:

set.seed(12345)
data <- data.frame(condition = as.factor(rep(c(0,1), times = 10)), 
    value = rnorm(20), type = rep(c("A","B"), each = 10), 
    subject = as.factor(rep(rep(1:5, each = 2), times = 2)))

Here are your t-test results:

> t.test(data[data$condition==0 & data$type=="A","value"], 
+        data[data$condition==1 & data$type=="A","value"], paired = TRUE)

    Paired t-test

data:  data[data$condition == 0 & data$type == "A", "value"] and 
       data[data$condition == 1 & data$type == "A", "value"]
t = 1.9384, df = 4, p-value = 0.1246
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3618957  2.0361137
sample estimates:
mean of the differences 
               0.837109 

Now we compare with emmeans() results. The important thing to know about emmeans() is that it provides an interpretation of a fitted model, not of the dataset itself. If you give it a different fitted model, you will get different results. Here is what we get with your model:

> library(lme4)
> mod1 = lmer(value ~ condition*type + (1|subject), data = data)
> pairs(emmeans(mod1, ~ condition | type))

type = A:
 contrast estimate    SE df t.ratio p.value
 0 - 1       0.837 0.475 12  1.764  0.1032 

type = B:
 contrast estimate    SE df t.ratio p.value
 0 - 1      -0.677 0.475 12 -1.426  0.1795 

Each t test shown indeed has 12 d.f., and that is because of information extracted from the model provided it, mod1. Now consider a different model: one that is fitted only to the data for type A:

> mod2 = lmer(value ~ condition + (1|subject), data = data, 
+             subset = (type == "A"))

This model does not include type as a predictor, so of course we have to alter the emmeans() call accordingly:

> pairs(emmeans(mod2, ~ condition))
 contrast estimate    SE df t.ratio p.value
 0 - 1       0.837 0.432  4 1.938   0.1246 

With model mod2, we get only 4 d.f. And, in fact, the t test and P value are identical to what was obtained above using t.test().

Again, the point is that emmeans() summarizes a fitted model. The model mod1 uses data from both types, and it incorporates an assumption that the error variance is the same with each type, and that the subject variance is the same with each type. It pools the information from the two types to obtain more accurate* estimates of these variances, and that is reflected in the increased degrees of freedom. [* more accurate, assuming the model assumptions are correct!]

Because of this, if you are satisfied, based on diagnostic plots, etc., that mod1 fits the data adequately, then the emmeans() results from mod1 are better than those from mod2 because more efficient use is made of available information. If, on the other hand, mod1 does not fit well, mod2 may be better and hence you should give that model to emmeans().

Related Question