Solved – One-way ANOVA, clustering levels using Tukey Kramer HSD

anovartukey-hsd-test

Suppose I am doing a one-way fixed effect ANOVA on an unbalanced data set. Let the number of levels of the one way factor be "a".

Suppose the omnibus null hypothesis is rejected.

So I do a post hoc test, say Tukey Kramer HSD.

This will tell me which pairwise means are different. My query is the following: Can I form groups of levels that are NOT significantly different in pairwise tests, and conclude that they form ONE group?

The precise algo for this would be:

For i in 2 : a
   For j in 1 : i-1
       if mean of level i is not significantly different to the
       mean of level j,then put i and j in the same cluster.
       After the first time mean of level i is not different
       to the mean of level j , just goto the next i , no 
       need to compare with remaining j's.

Can I use the above to cluster the levels in a data-driven way? If not, can someone tell me what test or package in R will be appropriate for clustering?

Alternately we run the following algo:

For i in 2 : a
    For j in 1: n-1
        compute the contrast that mean of level i is the same
        as the mean of level j.If they are the same then put them in              
        the same cluster, skip all remaining comparisons in the                                           
        inner loop and goto the next iteration in the outer loop.

To control the family wise error rate, since we are doing AT MOST 1 + 2 + … + (n-1 ) = (n-1)(n)/2 we can do a bonferroni or scheffe correction. What correction will be the best given we have AT MOST (n-1)(n)/2 comparisons?

Best Answer

Are you just trying to display the results as a compact letter display? Where, means sharing a letter are not significantly different.

If so, this can be accomplished with the multcompLetters function in the multcompView package.

But you may want to use the emmeans package for multiple comparisons.

Note that Tukey's test controls for the familywise error rate, so further corrections to the p-values are not needed.

if(!require(multcompView)){install.packages("multcompView")}
if(!require(emmeans)){install.packages("emmeans")}

fm1 = aov(breaks ~ wool + tension, data = warpbreaks)

TH = TukeyHSD(fm1, "tension", ordered = TRUE)

TH

P.value = TH$tension[,4]

library(multcompView)

multcompLetters(P.value)

###   M   L   H 
### "a" "b" "a" 

library(emmeans)

marginal = emmeans(fm1, ~ tension)

cld(marginal, Letters=letters)


###  tension   emmean       SE df lower.CL upper.CL .group
###  H       21.66667 2.738184 50 16.16686 27.16647  a    
###  M       26.38889 2.738184 50 20.88908 31.88869  a    
###  L       36.38889 2.738184 50 30.88908 41.88869   b   
###
### Results are averaged over the levels of: wool 
### Confidence level used: 0.95 
### P value adjustment: tukey method for comparing a family of 3 estimates 
### significance level used: alpha = 0.05