ANOVA – Choosing Between Mann-Whitney U Test and Kruskal-Wallis Test for Comparing Medians

anovakruskal-wallis test”wilcoxon-mann-whitney-test

I know that Kruskal Wallis test can be used to compare median of two or more groups (e.g. see this link: https://www.statisticshowto.datasciencecentral.com/kruskal-wallis/ ), and that Mann-Whitney U test can be used for two groups.

My question is: Can we use Kruskal Wallis all the time? It can do what Mann-Whitney U test does (when comparing two groups). If the answer is yes, then why don't we just discard Mann-Whitney U test, i.e. why does Mann-Whitney U test exist?

Best Answer

  1. With two samples a Kruskal-Wallis is equivalent to a Wilcoxon-Mann-Whitney but without the direction information; so you lose the ability to do a one-sided test.

  2. Some implementations use the exact distribution for small samples with the Wilcoxon-Mann-Whitney but not for the Kruskal-Wallis (yielding not-so-accurate p-values with small samples). R is one example; if both sample sizes are below 50 (and there are no ties) it uses the exact null distribution Wilcoxon-Mann-Whitney, but it uses the chi-squared approximation for the Kruskal-Wallis, sometimes leading to rejection when you should not reject (as in the example below) or failure to reject when you should.

    wilcox.test(a,b)
    
            Wilcoxon rank sum test
    
    data:  a and b
    W = 90, p-value = 0.05032
    alternative hypothesis: true location shift is not equal to 0
    

    kruskal.test(values~ind,stack(list(a=a,b=b)))
    
            Kruskal-Wallis rank sum test
    
    data:  values by ind
    Kruskal-Wallis chi-squared = 3.913, df = 1, p-value = 0.04791
    
  3. Even at larger samples the results may differ if the Wilcoxon-Mann-Whitney implements a continuity correction (as in R) but the Kruskal-Wallis does not I don't recall offhand seeing any packages that implement a continuity correction with the Kruskal-Wallis (nor is it quite clear how to do this for more than 2 groups nor that it would be a good idea to do so). This is less important, since they're both approximations - neither is the 'correct' answer - but it's still a potential difference in their decisions in cases where the p-value is near the significance level.