Solved – Why do we assign fractional ranks in rank tests

hypothesis testingwilcoxon-mann-whitney-testwilcoxon-signed-rank

Both Mann-Whitney U test and Wilcoxon signed rank test follow this procedure:

  1. sort the data
  2. assign ranks, Ties receive a rank equal to the average of the ranks they span
  3. compute the statistic using a sum the ranks
  4. compare statistic against critical values and make verdict

For me the 3rd step seems like an unnecessary complication. If we intend to sum the ranks anyway, there is no difference if we sum 2 and 3 or 2.5 and 2.5.

Am I missing something?

Best Answer

When ties are internal to a group, of course the result of assigning the average rank makes no difference compared to say breaking ties at random just as you point out, but it does matter when there are ties across groups.

When there are ties across groups, how you deal with it will matter and there are several choices.

The one you mention - giving the average of the ranks to all tied values - is common, but not the only way it is done.

That approach has the disadvantage that you no longer have a set of integers from 1 to n, so it mucks up the distribution of the ranks under the null. Even under a normal approximation, it affects the variance of the distribution, though the calculation of the adjusted variance is for many of the common procedures not so onerous; if ties are not heavy it will often still be a very good approximation.

Another approach is simply to break ties at random ... which has the virtue of simplicity, but means two people may come to different conclusions on the same data

A third approach is to break ties in all possible ways (or sample from the set of all possible ways if there's too many to do otherwise), and then combine the p-values in some way (if all p-values are on the same side of the significance level, there's no difficulty); taking the average is usually what is done.

I think the best approach is to go back to the permutation distribution. You lose the convenience of tables, but it's relatively easy to either enumerate (in small samples) or sample from (in larger samples) the permutation distribution (of the ranks); this is the "right" answer, really, and not hard to do with suitable software (it's rather easy in R in many of the common cases)