Nice question! Let's clear up some potential confusion, first. Dunn's test (Dunn, 1964) is precisely that: a test statistic which is a nonparametric analog to the pairwise t test one would conduct post hoc to an ANOVA. It is similar to the Mann-Whitney-Wilcoxon rank sum test, except that (1) it employs a measure of the pooled variance that is implied by the null hypothesis of the Kruskal-Wallis test, and (2) it uses the same rankings of one's original data as are used by the Kruskal-Wallis test.
Dunn also developed what is commonly referred to as the Bonferroni adjustment for multiple comparisons (Dunn, 1961), which is one of many methods to control the family-wise error rate (FWER) that have since been developed, and simply entails dividing $\alpha$ (one-tailed tests) or $\alpha/2$ (two-tailed tests) by the number of pairwise comparisons one is making. The maximum number of pairwise comparisons one may make with $k$ variables is $k(k-1)/2$, so that's 17*16/2=136 possible pairwise comparisons, implying that you might be able to reject a null hypothesis for any single test if $p \le \alpha/2/136$. Your concern about power is therefore warranted for this method.
Other methods to control the FWER exist with more statistical power however. For example, the Holm and Holm-Sidak stepwise methods (Holm, 1979) do not hemorrhage power the way the Bonferroni method does. There too, you could aim to control the false discovery rate (FDR) instead, and these methods—the Benjamini-Hochberg (1995), and Benjamini-Yekutieli (2001)—generally give more statistical power by assuming that some null hypotheses are false (i.e. by building the idea that that not all rejections are false rejections into sequentially modified rejection criteria). These and other multiple comparisons adjustments are implemented specifically for Dunn's test in Stata in the dunntest package (within Stata type net describe dunntest, from(https://alexisdinno.com/stata)
), and in R in the dunn.test package.
In addition, there is an alternative to Dunn's test (which is based on an approximate z test statistic): the Conover-Iman (exclusively) post hoc to a rejected Kruskal-Wallis test (which is based on a t distribution, and which is more powerful than Dunn's test; Conover & Iman, 1979; Convover, 1999). One can also use the methods to control the FWER or the FDR with the Conover-Iman tests, which is implemented for Stata in the conovertest package (within Stata type net describe conovertest, from(https://alexisdinno.com/stata)
), and for R in the conover.test package.
References
Benjamini, Y. and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300.
Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29(4):1165–1188.
Conover, W. J. (1999). Practical Nonparametric Statistics. Wiley, Hoboken, NJ, 3rd edition.
Conover, W. J. and Iman, R. L. (1979). On multiple-comparisons procedures. Technical Report LA-7677-MS, Los Alamos Scientific Laboratory.
Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293):52–64.
Dunn, O. J. (1964). Multiple comparisons using rank sums. Technometrics, 6(3):241–252.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(65-70):1979.
Author of dunn.test
here. As per the help file for dunn.test
under the sidak
option for the method
argument:
the FWER is controlled using Šidák's (1967) adjustment, and adjusted p-values = max(1, 1 - (1 - p)^m).
So, Yes. dunn.test
corrects p-values for multiple comparisons given the argument to method
. For the Šidák option, you would reject each null hypothesis if $p\le \alpha/2$ (i.e. dunn.test
report one-sided p-values).
Also from the help file:
method
adjusts the p-value for multiple comparisons using the Bonferroni, Šidák, Holm, Holm-Šidák, Hochberg, Benjamini-Hochberg, or Benjamini-Yekutieli adjustment (see Details). The default is no adjustment for multiple comparisons.
For the stepwise multiple comparisons adjustment methods, because rejection depends both on adjusted p-values (sometimes called q-values) and the ordering of unadjusted p-values, rejection decisions for a specified level of $\alpha$ are starred in the output.
The advantage of not starring output for the static results is that adjusted p-values can be interpreted by the reader according to their own preference for $\alpha$.
Best Answer
The output following the Kruskal-Wallis test provides all possible pairwise comparisons (six in the case of four groups). So the one on the first row compares group B with group A, the first on the second row compares group C with group A, etc.).
The upper number for each comparison is Dunn's pairwise z test statistic. The lower number is in this example the raw p-value associated with the test (i.e. you would compare to $\alpha/2$, although this p-value changes depending on the family-wise error rate or false discovery rate multiple comparisons adjustment option. For stepwise multiple comparison adjustments (e.g. Holm, Benjamini-Hochberg, etc.), the adjusted p-values will have an asterisk next to them if your would reject the null hypotheses at the specified significance level (which is not necessarily directly indicated by the adjusted p-values since rejection depends on ordering... see the documentation and citations therein for more details.).
I am the author of this package (emailing me, as explicitly indicated in the documentation, would likely be the best way to get in touch with me directly).