Solved – Why are ties so difficult in nonparametric statistics

nonparametricties

My nonparametric text, Practical Nonparametric Statistics, often gives clean formulas for expectations, variances, test statistics, and the like, but includes the caveat that this only works if we ignore ties. When calculating the Mann-Whitney U Statistic, it is encouraged that you throw out tied pairs when comparing which is bigger.

I get that ties don't really tell us much about which population is bigger (if that's what we're interested in) since neither group is bigger than the other, but it doesn't seem like that would matter when developing asymptotic distributions.

Why then is it such a quandary dealing with ties in some nonparametric procedures? Is there a way of extracting any useful information from ties, rather than simply throwing them away?

EDIT: In regards to @whuber's comment, I checked my sources again, and some procedures use an average of ranks instead of dropping the tied values completely. While this seems more sensible in reference to retaining information, it also seems to me that it lacks rigor. The spirit of the question still stands, however.

Best Answer

Most of the work on non-parametrics was originally done assuming that there was an underlying continuous distribution in which ties would be impossible (if measured accurately enough). The theory can then be based on the distributions of order statistics (which are a lot simpler without ties) or other formulas. In some cases the statistic works out to be approximately normal which makes things really easy. When ties are introduced either because the data was rounded or is naturally discrete, then the standard assumptions do not hold. The approximation may still be good enough in some cases, but not in others, so often the easiest thing to do is just give a warning that these formulas don't work with ties.

There are tools for some of the standard non-parametric tests that have worked out the exact distribution when ties are present. The exactRankTests package for R is one example.

One simple way to deal with ties is to use randomization tests like permutation tests or bootstrapping. These don't worry about asymptotic distributions, but use the data as it is, ties and all (note that with a lot of ties, even these techniques may have low power).

There was an article a few years back (I thought in the American Statistician, but I am not finding it) that discussed the ideas of ties and some of the things that you can do with them. One point is that it depends on what question you are asking, what to do with ties can be very different in a superiority test vs. a non-inferiority test.

Related Question