Solved – Kolmogorov-Smirnov vs Mann-Whitney U When There Are Ties

kolmogorov-smirnov testtieswilcoxon-mann-whitney-test

I have a dataset consisting of rank data, some 100 cases and 2 groups. (The 2 groups contain about 1/3 and 2/3 of the cases.) I would like to test whether the two groups differ with respect to median rank. I used a Mann-Whitney U test. A colleague suggested that when there are many ties, a Kolmogorov-Smirnov test is more accurate. Is that so? To what extent? In any case, the M-W test shows statistical significance and the K-S does not, so my questions are two: (1) Do ties affect the alpha error rate in the M-W test or just the beta error rate? (2) How do the the K-S test (which tests more than just the median) and the M-W U test compare when attempting to detect diferences in median (with respect to power and alpha error)? In short, which test do I trust?

Best Answer

I'm not sure what the basis is for your colleague's claim -- but they should support the claims they make before you accept them as true -- there's an astonishing amount of misinformed folklore about. (How do they know that this is true? Do you have good reason to think it must be true in your case?)

Both tests assume$^\dagger$ continuous distributions and both are impacted by ties (however, it's relatively easy to deal with ties in the Mann-Whitney and some software will do so automatically).

--

$\dagger$ Edit: To support my claim of the assumption of continuity in respect of the Mann-Whitney (since whuber says I am wrong on this point, I had better justify it), I refer to the beginning of Mann and Whitney (1947):

1. Summary. Let $x$ and $y$ be two random variables with continuous cumulative distribution functions $f$ and $g$.

So for Mann and Whitney's version of the test, they do explicitly assume continuity - and not idly, since they do rely on it in their derivation. However, it's possible (as I mention later) to deal with ties in the Mann-Whitney by working out the distribution of the test statistic at the null under the pattern of ties, or by correctly computing the effect of ties on the variance of the statistic under the normal approximation (what's usually referred to as the 'adjustment for ties').

--

For both tests, if the effect of the ties are not properly dealt with, both kinds of error rate are impacted - their type I error rates are lowered, and lowering the significance level necessarily lowers power ($=1-\beta$).

It's not 100% clear to me which test might be the most impacted, nor under what circumstances, but offhand I'd have expected the greater sensitivity generally went with the KS test* - and this is even before one 'adjusts' the Mann-Whitney for ties (i.e. if you used the normal approximation and used the variance for the no ties case).

*(personally, I'd use simulation suited to the specific instance to see what the properties would be under the sorts of conditions you see, at those sample sizes.)

Below is an illustration of the impact on the distribution of p-values under identical population distributions with of a moderate level of ties$^\ddagger$ with sample sizes of 33 and 67 under the default settings in R (which for the Mann-Whitney uses the normal approximation with correct calculation of variance in the presence of ties for this sample size):

enter image description here

For the tests to work 'as advertized' under the null, these distributions should look close to uniform. As you see, the Mann-Whitney (at least when properly calculating the variance of the sum of the ranks under the presence of ties, as here) is indeed very close to uniform. Since (as we can see) for the Kolmogorov-Smirnov test the proportion of p-values below $\alpha$ will be much smaller than $\alpha$, the test is highly conservative, with corresponding effects on power. [If anything, the effect is somewhat stronger than I'd have anticipated.]

$\ddagger\,$(the impact on the variance of the test statistic is fairly small in percentage terms)

Further, if your interest lies in a location-shift alternative, the Mann-Whitney would have greater power against that alternative to start with, so even if it did lose more power as a result of the discreteness (which I doubt), it may still have more power afterward.

You don't say how heavily tied your data are, nor in what sort of pattern. If both tests are more impacted than you're prepared to accept, you can work with the permutation distribution of either test statistic for you data (or with the permutation distribution of some other statistic, including a difference in sample medians if you wish).

In spite of many books (especially in some particular areas of application) stating that it is, the Mann-Whitney is not actually a test for a difference in medians. However, if you additionally assume that the populations distributions are the same under the null, and restrict the alternative to a location-shift, then it's a test for difference in any reasonable location measure - population medians, population lower quartiles, even population means (if they exist).

Indeed, one needn't restrict oneself to location shift alternatives. Assuming identical distributions under the null against an alternative that will move medians (or any other measure of location) will work; so for example, it would work perfectly well that way as a test of medians under an assumption of scale-shift. We must keep in mind however, that the Mann-Whitney is a far more general test than that and that when we rely on an assumption to make it a test for medians or whatever, we do actually lean on our assumption for the conclusion to make it mean what we want it to.

In short, which test do I trust?

Don't simply trust what anyone says (including me!) - unless they have solid evidence (I haven't brought any that's directly relevant to your situation,, and none relating to power because I haven't seen your pattern of ties and I am not 100% sure whether you're only interested in location shifts).

What kind of data do you have (what are you measuring, how are you measuring it, and how do ties arise)? What are you interested in finding out? Why do you mention medians?

Use simulation to find out how any tests you contemplate behave in circumstances similar to yours, and decide for yourself whether there's a problem to worry about. For both tests, see what the impact of ties is on the test, both under the null and under alternatives you care about, and then the case of the Mann-Whitney, see the effect of the adjustment for ties, and compare it with dealing with the exact permutation distribution (or in large samples like yours, with the randomization distribution). For the KS you can look at the exact permutation distribution as well.