Solved – Does R use Tukey or Tukey-Kramer test corrected for unequal sample size and does it use the multivariate t distribution

multiple-comparisonsmultivariate normal distributionrtukey-hsd-test

Does R use the classic Tukey HSD test for balanced data or corrects for unbalanced ones with the Tukey-Kramer approach? I mean the stats::TukeyHSD().

I found, that the Dunnet procedure uses multivariate t-distribution (for 2+ groups). Does Tukey HSD do the same? In R, I found that the multivariate t is calculated using the mvtnorm, which uses Monte Carlo simulations. Indeed, the results of Dunnet vary a little, if the seed is not set. But the outcomes of TukeyHSD are stable.

The R documentation only says that TukeyHSD works for mildly unbalanced data.

Best Answer

In glht, "tukey" doesn't refer to Tukey's HSD. It just means "do all pairwise comparisons". By default, ghlt uses a "single-step" correction method, but other correction methods could be used.

As far as I can tell, the TukeyHSD function uses the Tukey-Kramer procedure. The code for the function can be found on GitHub. See also the example on RPubs.

At least for the simple case of a one-way design with equal variances in groups (but potentially unequal sample sizes), it appears that the results of TukeyHSD will match those of emmeans with a Tukey adjustment, and those of glht with a "single-step" adjustment.

if(!require(emmeans)){install.packages("emmeans")}
if(!require(multcomp)){install.packages("multcomp")}

set.seed(sum(utf8ToInt("Sal2020")))

A = rnorm(12, 5, 2)
B = rnorm(12, 7, 2)
C = rnorm(6,  9, 2)

Y     = c(A, B, C)
Group = factor(c(rep("A", length(A)), rep("B", length(B)), rep("C", length(C))))

AOV = aov(Y ~ Group)

TukeyHSD(AOV)

   ### Tukey multiple comparisons of means
   ###
   ###           diff        lwr      upr     p adj
   ### B-A  3.7733290  1.3694242 6.177234 0.0016529
   ### C-A  3.7450798  0.8009097 6.689250 0.0106113
   ### C-B -0.0282492 -2.9724192 2.915921 0.9996880

model = lm(Y ~ Group)

library(emmeans)

marginal = emmeans(model, ~ Group)

pairs(marginal, adjust="tukey")

    ###  contrast estimate   SE df t.ratio p.value
    ### A - B     -3.7733 0.97 27 -3.892  0.0017 
    ### A - C     -3.7451 1.19 27 -3.154  0.0106 
    ### B - C      0.0282 1.19 27  0.024  0.9997 

    ### P value adjustment: tukey method for comparing a family of 3 estimates.

library(multcomp)

mc = glht(model, mcp(Group = "Tukey"))

summary(mc, test=adjusted("single-step"))

   ### Simultaneous Tests for General Linear Hypotheses
   ### 
   ###            Estimate Std. Error t value Pr(>|t|)   
   ### B - A == 0  3.77333    0.96954   3.892  0.00167 **
   ### C - A == 0  3.74508    1.18744   3.154  0.01053 * 
   ### C - B == 0 -0.02825    1.18744  -0.024  0.99969   
   ### 
   ### (Adjusted p values reported -- single-step method)

Likewise, it appears that the results of emmeans with no adjustment and those of glht with no adjustment, will match those of pairwise.t.test with no adjustment.

pairwise.t.test(Y, Group, p.adjust.method = "none")

   ### Pairwise comparisons using t tests with pooled SD 
   ###
   ###         A       B      
   ### B 0.00059 -      
   ### C 0.00393 0.98120
   ###
   ### P value adjustment method: none

pairs(marginal, adjust="none")

    ### contrast estimate   SE df t.ratio p.value
    ### A - B     -3.7733 0.97 27 -3.892  0.0006 
    ### A - C     -3.7451 1.19 27 -3.154  0.0039 
    ### B - C      0.0282 1.19 27  0.024  0.9812

summary(mc, test=adjusted("none"))

   ### Simultaneous Tests for General Linear Hypotheses
   ###
   ###            Estimate Std. Error t value Pr(>|t|)    
   ### B - A == 0  3.77333    0.96954   3.892 0.000589 ***
   ### C - A == 0  3.74508    1.18744   3.154 0.003927 ** 
   ### C - B == 0 -0.02825    1.18744  -0.024 0.981195 
   ###
   ### (Adjusted p values reported -- none method)

Related Solutions

Tukey HSD Test – Does the Tukey HSD Test Correct for Multiple Comparisons?

It is not necessary to correct for multiple comparisons when using Tukey's HSD. The procedure was developed specifically to account for multiple comparison and maintains experiment-wise alpha at the specified level (conventionally .05). Page 210 of Maxwell and Delaney's book on experimental design has explanations and examples of the procedure.

ANOVA – Effect Size Calculation for One-Way ANOVA and Tukey-HSD Tests

I was not able to reproduce the results you got from WebPower using the pilot data you supplied. I was able to reproduce your R code however.

You are correct that you can't use the $\eta^2$ for Cohen's f, but $f^2 = \frac{\eta^2}{1-\eta^2}$

"However, how should I compute the effect size from the pilot study" - use the $\eta^2$ from the pilot study.
"Why are there interaction effect sizes, i.e, the effect size for group x vs group y?" Those are the effect sizes for the pair-wise comparisons (if you were using a t-test or a TukeyHSD)

require(dplyr)
require(reshape2)

pilot <- data.frame(option1 = c(6.3, 2.8, 7.8, 7.9, 4.9),
                    option2 = c(9.9, 4.1, 3.9, 6.3, 6.9),
                    option3 = c(5.1, 2.9, 3.6, 5.7, 4.5),
                    option4 = c(1.0, 2.8, 4.8, 3.9, 1.6))
pilot2 <- pilot %>% 
  reshape2::melt(value.name = "y") %>%
  dplyr::rename("option" = "variable")

lm1 <- lm(y ~ option, data = pilot2)
aov1 <- aov(lm1)

means <- apply(pilot, 2, mean)
vs <- apply(pilot, 2, var)

# cohen's f for overall anova
# eta^2 = SSR / SST
eta.sq <- anova(lm1)$`Sum Sq`[2] / sum(anova(lm1)$`Sum Sq`)
f <- sqrt(eta.sq / (1-eta.sq))

# cohen's d for pairwise
d <- abs(means[c(1,1,1,2,2,3)] - means[c(2,3,4,3,4,4)]) / sqrt(((5-1)*vs[c(1,1,1,2,2,3)] + (5-1)*vs[c(2,3,4,3,4,4)])/ (5+5))
names(d) <- c("1-2", "1-3", "1-4", "2-3", "2-4", "3-4")

require(pwr)

# with 5 samples, we have the power to detect effect size f = 0.835
#  i.e. with only 5 samples, we need a large effect to detect

pwr::pwr.anova.test(k = 4, n = 5, sig.level = 0.05, power = 0.80)
#> 
#>      Balanced one-way analysis of variance power calculation 
#> 
#>               k = 4
#>               n = 5
#>               f = 0.8352722
#>       sig.level = 0.05
#>           power = 0.8
#> 
#> NOTE: n is number in each group

# since we have a really large effect in the pilot for f = 1.2,
#   we only need 3 per group to detect with 80% power

pwr::pwr.anova.test(k = 4, f = 1.2414, sig.level = 0.05, power = 0.80)
#> 
#>      Balanced one-way analysis of variance power calculation 
#> 
#>               k = 4
#>               n = 2.950833
#>               f = 1.2414
#>       sig.level = 0.05
#>           power = 0.8
#> 
#> NOTE: n is number in each group

Best Answer

Related Solutions

Tukey HSD Test – Does the Tukey HSD Test Correct for Multiple Comparisons?

ANOVA – Effect Size Calculation for One-Way ANOVA and Tukey-HSD Tests

Related Question