Solved – Can the Wilcoxon rank sum test give a different result to the Kolmogorov-Smirnov test result

hypothesis testingkolmogorov-smirnov testrwilcoxon-mann-whitney-test

Let's say I have two data sets (in R, say); $x_1, x_2,…, x_n$ and $y_1, y_2,…, y_n$.

The Wilcoxon rank sum test rejects, indicating that the "X" population distribution differs from that for "Y".

Is it possible that the two sample Kolmogorov-Smirnov test would not indicate that they're different?

Or can we predict that the Wilcoxon would not cause us to reject the null if the Kolmogorov Smirnov test did not?

Best Answer

The crux of @Glen_b's answer (+1) is that these are two different tests that are "designed to pick up... [different and] specific kinds of differences" between the two distributions. So to understand how the results (in terms of whether they are significant or not) can differ between the Wilcoxon rank sum test and the Kolmogorov-Smirnov tests, we need to understand what the tests are designed to detect.

The Wilcoxon rank sum test tests if:

the probability of an observation from the population X exceeding an observation from the second population Y equals the probability of an observation from Y exceeding an observation from X: P(X > Y) = P(Y > X) or P(X > Y) + 0.5 · P(X = Y) = 0.5

That is, it is testing if values of X tend to be larger or smaller than values of Y.
The Kolmogorov-Smirnov test assesses the largest¹ difference between the two empirical cumulative distribution functions (ECDFs) and compares it to its sampling distribution assuming the distributions are the same.

From here, it is easy to see how there can be datasets where the tests will yield different results.

The Wilcoxon will be significant while the KS will not when one sample is consistently greater than the other, but not by a large absolute value, and where the distribution shapes are largely the same.

set.seed(9825)
g1 = rnorm(10)
g2 = g1+1.27

wilcox.test(g1, g2)
#   Wilcoxon rank sum test
# 
# data:  g1 and g2
# W = 22, p-value = 0.03546
# alternative hypothesis: true location shift is not equal to 0
ks.test(g1, g2)
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  g1 and g2
# D = 0.5, p-value = 0.1678
# alternative hypothesis: two-sided

The KS will be significant while the rank sum test will not when the means and medians are the same but the shapes differ markedly.

set.seed(3806)
g1 = scale(rnorm(15),       center=TRUE, scale=FALSE)
g2 = scale(rnorm(15, sd=5), center=TRUE, scale=FALSE)

wilcox.test(g1, g2)
#   Wilcoxon rank sum test
# 
# data:  g1 and g2
# W = 131, p-value = 0.461
# alternative hypothesis: true location shift is not equal to 0
ks.test(g1, g2)
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  g1 and g2
# D = 0.53333, p-value = 0.02625
# alternative hypothesis: two-sided

_{1. More technically the supremum.}

Related Solutions

Solved – Kolmogorov-Smirnov two-sample test

I am assuming you are asking because the Suanshu help page reports in reference to the K-S distribution, "This is not done yet." Luckily, it is very easy to do in R. If x and y are your two samples, ks.test(x,y) returns the test statistic and pvalue. For example,

> x <- rnorm(50)
> y <- runif(30)
> ks.test(x, y)    
        Two-sample Kolmogorov-Smirnov test    
data:  x and y 
D = 0.5, p-value = 9.065e-05
alternative hypothesis: two-sided

By default, it will compute exact or asymptotic p-values based on the product of the sample sizes (exact p-values for n.x*n.y < 10000 in the two-sample case), or you can specify this option with a third argument, exact=F or exact=T. Exact p-values are calculated using the methods of Marsaglia, et al. (2003), which the Suanshu documentation also cites. Some large sample approximations are given here, although I don't have a proper citation. Lastly, if you don't want to install R, there are web calculators for the two-sample K-S test, although I don't know if they use the same algorithm as R because the one I found only reported three decimal points for the p-value.

Solved – Use of two-sample Kolmogorov-Smirnov test to evaluate similarities between two different distributions

You say you're trying to make the two distributions close; the $D$ statistic measures the discrepancy between the two, and (as long as it's not changing the dimension of the fit) choosing $r$ to minimize that discrepancy makes complete sense.

I don't think there's any need to deal with the $p$-value; $D$ is a sensible thing to optimize.

If you're changing the dimension of the fit (adding or removing parameters), just minimizing $D$ isn't going to be sufficient, since more parameters will always tend to improve the fit.

Best Answer

Related Solutions

Solved – Kolmogorov-Smirnov two-sample test

Solved – Use of two-sample Kolmogorov-Smirnov test to evaluate similarities between two different distributions

Related Question