Solved – Kruskal-Wallis Test and Mann-Whitney U Test

distributionskruskal-wallis test”nonparametricwilcoxon-mann-whitney-test

I have a three samples of data where each has a non-normal distribution:

enter image description here

I want to determine whether the samples differ. Therefore I apply the KW test and it gives me a significant value, i.e. the samples come from different populations?

This however, doesn't tell you anything about which are different. Therefore to compare two of the samples, the MWU test can be applied to each of the samples, i.e. three MWU tests are needed.

This has given me pause for thought:

1) Is it acceptable to apply the MWU test separately to each of the populations? Could this increase the false discovery rate? (Although I appreciate I don't have a large amount of populations to compare)

2) Would there be any point in using the KW test, if I am subsequently going to apply the MWU test to each of the samples?

Best Answer

1a) Is it acceptable to apply the MWU test separately to each of the populations?

Nope: (1) the rank sum test (i.e. $U$ test) ignores the specific rankings used to inform the inference in the Kruskal-Wallis test, and bases inference on a different set of rankings; (2) the rank sum test does not employ the pooled variance implied by the null hypothesis in the Kruskal-Wallis test (think about the t tests post hoc to a one-way ANOVA: they require pooled variance, right? Same here).

Dunn's test, the Conover-Iman test, and the Dwass-Steel-Crichtlow-Fligner test are all designed to address (1) and (2). Dunn's test is perhaps more frequently implemented in statistical software.

1b) Could this increase the false discovery rate? (Although I appreciate I don't have a large amount of populations to compare)

Yup. All the same issues with multiple comparisons arise, so you are correct to be concerned about the familywise error rate, or the false discovery rate. Dunn's test and the Conover-Iman test both permit multiple comparisons adjustments (pick your favorite approach). The DSCF test has control of the FWER built into it, in such a way that I think it would be difficult to subsequently apply FDR-controlling methods.

2) Would there be any point in using the KW test, if I am subsequently going to apply the MWU test to each of the samples?

This is a more general question about the nature of omnibus tests relative to post hoc tests, and I refer you to the literature. Off the bat, I can think of two reasons to perform an omnibus test prior to conducting post hoc tests:

  1. Laziness. If the number of pairwise comparisons is very large (which can grow as fast as $k(k-1)/2$, where $k$ is the number of groups, then a negative result on the omnibus test may indicate that you need not proceed to all the tedium of pairwise tests (although computers help with that), nor proceed to the tedium of interpreting all those pairwise tests (although computers can help somewhat with that), nor proceed to waste a lot of paper presenting all those tests.

  2. The omnibus test gives one a moment where $\alpha$ can be interpreted for a single test, without the complexifications that arise when thinking about, inferring over and communicating about families of tests, or about the nature false discovery across many tests.

I don't find these especially compelling. How do you normally think about, for example, conducting a one-way ANOVA? The same reasoning applies here. If you have already decided on the two-way tests a priori I suppose you can ignore the Kruskal-Wallis (although, when considering which pairwise test to use, I might give some serious thought to two points I made in my answer to 1a).

Caveat: The Conover-Iman test is explicit in its assumption that the Kruskal-Wallis test has been rejected (indeed, the Kruskal-Wallis test statistic itself is used to compute their t approximations to the actual rank sum distributions).