Hypothesis-Testing – Can Mann-Whitney Test Be Used for Post-Hoc Comparisons After Kruskal-Wallis? Detailed Explanation

dunn-testhypothesis testingkruskal-wallis test”post-hocwilcoxon-mann-whitney-test

I have a simulation where an animal is placed in a hostile environment and timed to see how long it can survive using some approach to survival. There are three approaches it can use to survive. I ran 300 simulations of the animal using each survival approach. All simulations take place in the same environment but there's some randomness so it's different each time. I time how many seconds the animal survives in each simulation. Living longer is better. My data looks like this:

Approach 1, Approach 2, Approach 2
45,79,38
48,32,24
85,108,44
... 300 rows of these

I'm unsure of everything I do after this point so let me know if I'm doing something stupid and wrong. I'm trying to find out if there's a statistical difference on lifespan using a particular approach.

I ran a Shapiro test on each of the samples and they came back with tiny p values, so I believe the data isn't normalized.

Data on rows have no relationship to each other. The random seed used for each simulation was different. As a result, I believe the data isn't paired.

Because the data is not normalized, not paired and there were more than two samples, I ran a Kruskal Wallis test which came back with a p-value of 0.048. I then moved on to a post hoc, selecting Mann Whitney. In really not sure if Mann Whitney should be used here.

I compared each survival approach with each other approach by performing the Mann Whitney test i.e. {(approach 1, approach 2), (approach 1, approach 3), (approach 2, approach 3)}. There was no finding of statistical significance between the pair (approach 2, approach 3) using a two tailed test but there was significance difference found using a one tailed test.

Problems:

  1. I don't know if using Mann Whitney like this makes sense.
  2. I don't know if I should be using a one or two tailed Mann Whitney.

Best Answer

No, you should not use the Mann-Whitney $U$ test in this circumstance.

Here's why: Dunn's test is an appropriate post hoc test* following rejection of a Kruskal-Wallis test. If one proceeds by moving from a rejection of Kruskal-Wallis to performing ordinary pair-wise rank sum (i.e. Wilcoxon or Mann-Whitney) tests, then two problems obtain: (1) the ranks used for the pair-wise rank sum tests are not the ranks used by the Kruskal-Wallis test; and (2) the rank sum tests do not use the pooled variance implied by the Kruskal-Wallis null hypothesis. Dunn's test does not have these problems

Post hoc tests following rejection of a Kruskal-Wallis test which have been adjusted for multiple comparisons may fail to reject all pairwise tests for a given family-wise error rate or given false discovery rate corresponding to a given $\alpha$ for the omnibus test, just as with any other multiple comparison omnibus/post hoc testing scenario.

Unless you have reason to believe that one group's survival time is longer or shorter than another's a priori, you should be using the two-sided tests.

Dunn's test can be performed in Stata using dunntest (type net describe dunntest, from(https://www.alexisdinno.com/stata)), and in R using the dunn.test package.

Also, I wonder if you might take a survival analysis approach to assessing whether and when an animal dies based on different conditions?


* A few less well-known post hoc pair-wise tests to follow a rejected Kruskal-Wallis, include Conover-Iman (like Dunn, but based on the t distribution, rather than the z distribution, implemented for Stata in the conovertest package, and for R in the conover.test package), and the Dwass-Steel-Citchlow-Fligner tests.