Solved – Multiple comparisons with many groups

dunn-testkruskal-wallis test”multiple-comparisonspost-hoc

I would like to determine if using multiple comparisons test would be appropriate for my data. I used the Kruskal-Wallis test to determine if there were differences in mean inhibition among $17$ different groups. The analysis revealed that there were significant differences and now I would like to use a multiple comparison procedure (perhaps Dunn's since I have unequal sample sizes) to see which groups were different from the rest.

I was wondering since I have many groups ($k = 17$) would this make a multiple comparisons test have very little power or not appropriate to perform for this data set?

Best Answer

Nice question! Let's clear up some potential confusion, first. Dunn's test (Dunn, 1964) is precisely that: a test statistic which is a nonparametric analog to the pairwise t test one would conduct post hoc to an ANOVA. It is similar to the Mann-Whitney-Wilcoxon rank sum test, except that (1) it employs a measure of the pooled variance that is implied by the null hypothesis of the Kruskal-Wallis test, and (2) it uses the same rankings of one's original data as are used by the Kruskal-Wallis test.

Dunn also developed what is commonly referred to as the Bonferroni adjustment for multiple comparisons (Dunn, 1961), which is one of many methods to control the family-wise error rate (FWER) that have since been developed, and simply entails dividing $\alpha$ (one-tailed tests) or $\alpha/2$ (two-tailed tests) by the number of pairwise comparisons one is making. The maximum number of pairwise comparisons one may make with $k$ variables is $k(k-1)/2$, so that's 17*16/2=136 possible pairwise comparisons, implying that you might be able to reject a null hypothesis for any single test if $p \le \alpha/2/136$. Your concern about power is therefore warranted for this method.

Other methods to control the FWER exist with more statistical power however. For example, the Holm and Holm-Sidak stepwise methods (Holm, 1979) do not hemorrhage power the way the Bonferroni method does. There too, you could aim to control the false discovery rate (FDR) instead, and these methods—the Benjamini-Hochberg (1995), and Benjamini-Yekutieli (2001)—generally give more statistical power by assuming that some null hypotheses are false (i.e. by building the idea that that not all rejections are false rejections into sequentially modified rejection criteria). These and other multiple comparisons adjustments are implemented specifically for Dunn's test in Stata in the dunntest package (within Stata type net describe dunntest, from(https://alexisdinno.com/stata)), and in R in the dunn.test package.

In addition, there is an alternative to Dunn's test (which is based on an approximate z test statistic): the Conover-Iman (exclusively) post hoc to a rejected Kruskal-Wallis test (which is based on a t distribution, and which is more powerful than Dunn's test; Conover & Iman, 1979; Convover, 1999). One can also use the methods to control the FWER or the FDR with the Conover-Iman tests, which is implemented for Stata in the conovertest package (within Stata type net describe conovertest, from(https://alexisdinno.com/stata)), and for R in the conover.test package.

References

Benjamini, Y. and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300.

Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29(4):1165–1188.

Conover, W. J. (1999). Practical Nonparametric Statistics. Wiley, Hoboken, NJ, 3rd edition.

Conover, W. J. and Iman, R. L. (1979). On multiple-comparisons procedures. Technical Report LA-7677-MS, Los Alamos Scientific Laboratory.

Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293):52–64.

Dunn, O. J. (1964). Multiple comparisons using rank sums. Technometrics, 6(3):241–252.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(65-70):1979.

Related Question