Solved – Can the Wilcoxon rank sum test give a different result to the Kolmogorov-Smirnov test result

hypothesis testingkolmogorov-smirnov testrwilcoxon-mann-whitney-test

Let's say I have two data sets (in R, say); $x_1, x_2,…, x_n$ and $y_1, y_2,…, y_n$.

The Wilcoxon rank sum test rejects, indicating that the "X" population distribution differs from that for "Y".

Is it possible that the two sample Kolmogorov-Smirnov test would not indicate that they're different?

Or can we predict that the Wilcoxon would not cause us to reject the null if the Kolmogorov Smirnov test did not?

Best Answer

The crux of @Glen_b's answer (+1) is that these are two different tests that are "designed to pick up... [different and] specific kinds of differences" between the two distributions. So to understand how the results (in terms of whether they are significant or not) can differ between the Wilcoxon rank sum test and the Kolmogorov-Smirnov tests, we need to understand what the tests are designed to detect.

  • The Wilcoxon rank sum test tests if:

    the probability of an observation from the population X exceeding an observation from the second population Y equals the probability of an observation from Y exceeding an observation from X: P(X > Y) = P(Y > X) or P(X > Y) + 0.5 ยท P(X = Y) = 0.5

    That is, it is testing if values of X tend to be larger or smaller than values of Y.

  • The Kolmogorov-Smirnov test assesses the largest1 difference between the two empirical cumulative distribution functions (ECDFs) and compares it to its sampling distribution assuming the distributions are the same.

From here, it is easy to see how there can be datasets where the tests will yield different results.

  • The Wilcoxon will be significant while the KS will not when one sample is consistently greater than the other, but not by a large absolute value, and where the distribution shapes are largely the same.

    set.seed(9825)
    g1 = rnorm(10)
    g2 = g1+1.27
    
    wilcox.test(g1, g2)
    #   Wilcoxon rank sum test
    # 
    # data:  g1 and g2
    # W = 22, p-value = 0.03546
    # alternative hypothesis: true location shift is not equal to 0
    ks.test(g1, g2)
    #   Two-sample Kolmogorov-Smirnov test
    # 
    # data:  g1 and g2
    # D = 0.5, p-value = 0.1678
    # alternative hypothesis: two-sided
    

    enter image description here

  • The KS will be significant while the rank sum test will not when the means and medians are the same but the shapes differ markedly.

    set.seed(3806)
    g1 = scale(rnorm(15),       center=TRUE, scale=FALSE)
    g2 = scale(rnorm(15, sd=5), center=TRUE, scale=FALSE)
    
    wilcox.test(g1, g2)
    #   Wilcoxon rank sum test
    # 
    # data:  g1 and g2
    # W = 131, p-value = 0.461
    # alternative hypothesis: true location shift is not equal to 0
    ks.test(g1, g2)
    #   Two-sample Kolmogorov-Smirnov test
    # 
    # data:  g1 and g2
    # D = 0.53333, p-value = 0.02625
    # alternative hypothesis: two-sided
    

    enter image description here

1. More technically the supremum.