Suppose I am doing a one-way fixed effect ANOVA on an unbalanced data set. Let the number of levels of the one way factor be "a".
Suppose the omnibus null hypothesis is rejected.
So I do a post hoc test, say Tukey Kramer HSD.
This will tell me which pairwise means are different. My query is the following: Can I form groups of levels that are NOT significantly different in pairwise tests, and conclude that they form ONE group?
The precise algo for this would be:
For i in 2 : a
For j in 1 : i-1
if mean of level i is not significantly different to the
mean of level j,then put i and j in the same cluster.
After the first time mean of level i is not different
to the mean of level j , just goto the next i , no
need to compare with remaining j's.
Can I use the above to cluster the levels in a data-driven way? If not, can someone tell me what test or package in R will be appropriate for clustering?
Alternately we run the following algo:
For i in 2 : a
For j in 1: n-1
compute the contrast that mean of level i is the same
as the mean of level j.If they are the same then put them in
the same cluster, skip all remaining comparisons in the
inner loop and goto the next iteration in the outer loop.
To control the family wise error rate, since we are doing AT MOST 1 + 2 + … + (n-1 ) = (n-1)(n)/2 we can do a bonferroni or scheffe correction. What correction will be the best given we have AT MOST (n-1)(n)/2 comparisons?
Best Answer
Are you just trying to display the results as a compact letter display? Where, means sharing a letter are not significantly different.
If so, this can be accomplished with the
multcompLetters
function in themultcompView
package.But you may want to use the
emmeans
package for multiple comparisons.Note that Tukey's test controls for the familywise error rate, so further corrections to the p-values are not needed.