Solved – Pairwise comparisons via emmeans

lsmeansmixed modelmultiple-comparisons

I have a question about emmeans and mixed effect model.
I fit a complex model using lmer() with the following variables:

A: a binary categorical predictor, between-subject

B: a binary categorical predictor, within-subject

C: a categorical predictor with 3 levels, within-subject

The model is as follows:

library(lme4)
model= lmer(d ~ A * B * C + (1|subject), data = ddata, REML = FALSE)

The results show a significant interaction between the predictors. I need to know the pairwise comparisons so I use emmeans as follow:

library(emmeans)
emmeans(model, pairwise ~ A * B * C)

emmeans use the Tukey method for the pairwise comparisons. A reviewer of my paper does not like Tukey method as she said:

"Tukey’s HSD assumes all measures are independent from each other and is not appropriate for repeated measures. Please consider performing a different test even if that changes the results."

My questions are:

Is the claim that Tukey in this case is not appropriate? if no can you cite any source?

if yes what is the alternative method?

Best Answer

Your reviewer has a good point, because you have both between- and within-subject comparisons in your collection, and those have different standard errors.

But the other thing I notice is that there are a total of $2\times2\times3=12$ cell means, and the emmeans() call shown produces all ${12\choose2}=66$ possible comparisons. I truly wonder if all of those are really of interest. If you do, I suggest trying the "mvt" adjustment, which is the same idea as Tukey but is based on the multivariate $t$ distribution with the actual correlation structure in your model (which in this case is not the same as the correlation structure assumed by the Tukey method).

However, often when there is an interaction, people opt for "simple" comparisons -- in this case, comparing the levels of one factor while holding the other two fixed. Those are easily done via

emm <- emmeans(model, ~ A * B * C)
simp <- pairs(emm, simple = "each")
simp

This will yield 6 comparisons of the levels of A, 6 comparisons of the two levels of B, and 4 sets of 3 comparisons among the levels of C, for a total of 24 comparisons instead of 66. Moreover, the issues of Tukey being inappropriate go away, because each set of simple comparisons is homogeneous.

Some additional comments:

While REML = FALSE is often a good idea for testing your model against other models, I recommend refitting your model with REML = TRUE before proceding to post hoc comparisons, because the REML method reduces bias in the estimates.
Consider using meaningful names in place of A, B, and C. We're not doing a generic math problem here; we are trying to tell a useful story about an actual experiment. Meaningful names help everybody understand what you're talking about.
I suggest doing things in steps, as shown above, over trying to get every result you want in one R call. That makes it more natural to focus on particular results or go in different directions; e.g., summary(emm) shows the 12 cell means and pairs(emm, by = “C”) could be used to compare the four A:B combinations at each level of C.
You might want to do a stronger multiplicity correction rather than a separate one for each of the comparisons. For example, test(simp[[1]], by = NULL, adjust = "mvt") puts all 6 of the A comparisons in one family and applies the multivariate $t$ adjustment. (The Tukey adjustment is completely inappropriate for that because it is not a set of pairwise comparisons; in fact, the software doesn't even allow that adjustment.)

Related Solutions

Pairwise Comparisons Using emmeans in Mixed Three-Way Interactions

It shouldn't be necessary to fit a separate model just to do the post-hoc comparisons you want. You had tried:

emms <-  emmeans(fit1b, ~ AB*C)
contrast(emms, interaction = "pairwise")

but you can get the same results from the original model using by variables judiciously:

emms1 <- emmeans(fit1, ~ A*B | C)
con1 <- contrast(emms1, interaction = "pairwise")
pairs(con1, by = NULL)

The con1 results are the desired 1-d.f. interaction effects for each level of C (the by factor is remembered). Then we compare them pairwise, no longer using the by grouping. By default, a Tukey adjustment is made to the family of comparisons, but you may use a different method via adjust.

R Programming – Correcting for Multiple Pairwise Comparisons with GAM Objects in R

The glht() function for generalized linear hypotheses from the multcomp package can be used to carry out various kinds of contrasts using a range of different p-value adjustments. The contrasts you are looking for are also called "Tukey" contrasts for all pairwise comparisons. The p-value adjustments include single-step, Shaffer, Westfall, and all p.adjust methods, see ?summary.glht.

As @GavinSimpson pointed out in the comments: For gam() objects from mgcv this does not work out of the box but requires some manual intervention. For lmer() from lme4 everything works conveniently. I illustrate below how both packages can be used with multcomp to obtain equivalent results. For illustration I use the sleepstudy data from lme4 but collapse the numeric regressor Days to a three-level factor (merely for illustration purposes):

library("lme4")
data("sleepstudy", package = "lme4")
sleepstudy$Days <- cut(sleepstudy$Days, breaks = c(-Inf, 2.5, 5.5, Inf),
  labels = c("low", "med", "high"))
m1 <- lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
summary(m1)
## ...
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  262.170      9.802  26.747
## Daysmed       31.217      6.365   4.905
## Dayshigh      67.433      5.954  11.326
## ...

Then glht() can be used to set up all pairwise (aka Tukey) contrasts for the Days factor. The summary() method then applies the p-value adjustment (single-step, by default).

library("multcomp")
g1 <- glht(m1, linfct = mcp(Days = "Tukey"))
summary(g1)
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## Fit: lmer(formula = Reaction ~ Days + (1 | Subject), data = sleepstudy)
## 
## Linear Hypotheses:
##                 Estimate Std. Error z value Pr(>|z|)    
## med - low == 0    31.217      6.365   4.905 2.28e-06 ***
## high - low == 0   67.433      5.954  11.326  < 1e-06 ***
## high - med == 0   36.216      5.954   6.083  < 1e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

The same model can be fitted with gam() as described in the question.

library("mgcv")
m2 <- gam(Reaction ~ Days + s(Subject, bs = "re"), data = sleepstudy)
summary(m2)
## ...
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  262.170      9.802  26.747  < 2e-16 ***
## Daysmed       31.217      6.365   4.905 2.27e-06 ***
## Dayshigh      67.433      5.954  11.326  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ...

However, the mcp(Days = "Tukey") method for describing the Tukey contrasts does not cooperate with gam() output and hence fails:

g2 <- glht(m2, linfct = mcp(Days = "Tukey"))
## Error in linfct[[nm]] %*% C : 
##   requires numeric/complex matrix/vector arguments

However, it is not difficult (albeit a bit technical and tedious) to set up the contrast matrix by hand.

contr <- matrix(0, nrow = 3, ncol = length(coef(m2)))
colnames(contr) <- names(coef(m2))
rownames(contr) <- c("med - low", "high - low", "high - med")
contr[, 2:3] <- rbind(c(1, 0), c(0, 1), c(-1, 1))

The first columns of the contrast matrix show what is needed here: As the low coefficient is constrained to zero in the model, med - low is simply med and analogously for high - low. The last row then shows the contrast for high - med:

contr[, 1:5]
##            (Intercept) Daysmed Dayshigh s(Subject).1 s(Subject).2
## med - low            0       1        0            0            0
## high - low           0       0        1            0            0
## high - med           0      -1        1            0            0

And with this contrast matrix we can conduct the pairwise comparison with glht():

g2 <- glht(m2, linfct = contr)
summary(g2)
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: gam(formula = Reaction ~ Days + s(Subject, bs = "re"), data = sleepstudy)
## 
## Linear Hypotheses:
##                 Estimate Std. Error z value Pr(>|z|)    
## med - low == 0    31.217      6.365   4.905 2.35e-06 ***
## high - low == 0   67.433      5.954  11.326  < 1e-06 ***
## high - med == 0   36.216      5.954   6.083  < 1e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Another quite convenient way to indicate the contrasts to be tested is through character strings. This can set up linear functions based on the effect names from names(coef(m2)). And for factors with fewer levels (and hence fewer Tukey contrasts) this works quite nicely - but if the comparisons become more complex it's possibly easier to constract the contrast matrix as above.

g3 <- glht(m2, linfct = c("Daysmed = 0", "Dayshigh = 0", "Dayshigh - Daysmed = 0"))
summary(g3)
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: gam(formula = Reaction ~ Days + s(Subject, bs = "re"), data = sleepstudy)
## 
## Linear Hypotheses:
##                         Estimate Std. Error z value Pr(>|z|)    
## Daysmed == 0              31.217      6.365   4.905 2.53e-06 ***
## Dayshigh == 0             67.433      5.954  11.326  < 1e-06 ***
## Dayshigh - Daysmed == 0   36.216      5.954   6.083  < 1e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Best Answer

Related Solutions

Pairwise Comparisons Using emmeans in Mixed Three-Way Interactions

R Programming – Correcting for Multiple Pairwise Comparisons with GAM Objects in R

Related Question