Solved – Does R use Tukey or Tukey-Kramer test corrected for unequal sample size and does it use the multivariate t distribution

multiple-comparisonsmultivariate normal distributionrtukey-hsd-test

Does R use the classic Tukey HSD test for balanced data or corrects for unbalanced ones with the Tukey-Kramer approach? I mean the stats::TukeyHSD().

I found, that the Dunnet procedure uses multivariate t-distribution (for 2+ groups). Does Tukey HSD do the same? In R, I found that the multivariate t is calculated using the mvtnorm, which uses Monte Carlo simulations. Indeed, the results of Dunnet vary a little, if the seed is not set. But the outcomes of TukeyHSD are stable.

The R documentation only says that TukeyHSD works for mildly unbalanced data.

Best Answer

In glht, "tukey" doesn't refer to Tukey's HSD. It just means "do all pairwise comparisons". By default, ghlt uses a "single-step" correction method, but other correction methods could be used.

As far as I can tell, the TukeyHSD function uses the Tukey-Kramer procedure. The code for the function can be found on GitHub. See also the example on RPubs.

At least for the simple case of a one-way design with equal variances in groups (but potentially unequal sample sizes), it appears that the results of TukeyHSD will match those of emmeans with a Tukey adjustment, and those of glht with a "single-step" adjustment.

if(!require(emmeans)){install.packages("emmeans")}
if(!require(multcomp)){install.packages("multcomp")}

set.seed(sum(utf8ToInt("Sal2020")))

A = rnorm(12, 5, 2)
B = rnorm(12, 7, 2)
C = rnorm(6,  9, 2)

Y     = c(A, B, C)
Group = factor(c(rep("A", length(A)), rep("B", length(B)), rep("C", length(C))))

AOV = aov(Y ~ Group)

TukeyHSD(AOV)

   ### Tukey multiple comparisons of means
   ###
   ###           diff        lwr      upr     p adj
   ### B-A  3.7733290  1.3694242 6.177234 0.0016529
   ### C-A  3.7450798  0.8009097 6.689250 0.0106113
   ### C-B -0.0282492 -2.9724192 2.915921 0.9996880

model = lm(Y ~ Group)

library(emmeans)

marginal = emmeans(model, ~ Group)

pairs(marginal, adjust="tukey")

    ###  contrast estimate   SE df t.ratio p.value
    ### A - B     -3.7733 0.97 27 -3.892  0.0017 
    ### A - C     -3.7451 1.19 27 -3.154  0.0106 
    ### B - C      0.0282 1.19 27  0.024  0.9997 

    ### P value adjustment: tukey method for comparing a family of 3 estimates.

library(multcomp)

mc = glht(model, mcp(Group = "Tukey"))

summary(mc, test=adjusted("single-step"))

   ### Simultaneous Tests for General Linear Hypotheses
   ### 
   ###            Estimate Std. Error t value Pr(>|t|)   
   ### B - A == 0  3.77333    0.96954   3.892  0.00167 **
   ### C - A == 0  3.74508    1.18744   3.154  0.01053 * 
   ### C - B == 0 -0.02825    1.18744  -0.024  0.99969   
   ### 
   ### (Adjusted p values reported -- single-step method)

Likewise, it appears that the results of emmeans with no adjustment and those of glht with no adjustment, will match those of pairwise.t.test with no adjustment.

pairwise.t.test(Y, Group, p.adjust.method = "none")

   ### Pairwise comparisons using t tests with pooled SD 
   ###
   ###         A       B      
   ### B 0.00059 -      
   ### C 0.00393 0.98120
   ###
   ### P value adjustment method: none

pairs(marginal, adjust="none")

    ### contrast estimate   SE df t.ratio p.value
    ### A - B     -3.7733 0.97 27 -3.892  0.0006 
    ### A - C     -3.7451 1.19 27 -3.154  0.0039 
    ### B - C      0.0282 1.19 27  0.024  0.9812

summary(mc, test=adjusted("none"))

   ### Simultaneous Tests for General Linear Hypotheses
   ###
   ###            Estimate Std. Error t value Pr(>|t|)    
   ### B - A == 0  3.77333    0.96954   3.892 0.000589 ***
   ### C - A == 0  3.74508    1.18744   3.154 0.003927 ** 
   ### C - B == 0 -0.02825    1.18744  -0.024 0.981195 
   ###
   ### (Adjusted p values reported -- none method)
Related Question