Wilcoxon-Mann-Whitney Test – The Exact Distribution of Wilcoxon Rank-Sum Statistic U Explained

descriptive statisticsdistributionsnonparametricwilcoxon-mann-whitney-test

The distribition of the rank-sum statistic U is assumed to be normal for large number of samples being considered. What is the exact distribution? I want to compare and sometimes fuse results from various tests wherein some tests might not have large number of samples. I want to have a the exact distributions in cases where, say $n_1 n_2 < 30$. Is there a closed form that can be used or calculated?

Update:
So apparently, people cite Streitberg, B. and J. Rohmel, Exact distributions for permutation and rank tests: An introduction to some recently published algorithms, Statist. Software Newsletter 1 (1986) 10-17. for the exact distribution, but I have not been able to find either the paper or the result yet.

Best Answer

AFAIK, there is no closed form for the distribution. Using R, the naive implementation of getting the exact distribution works for me up to group sizes of at least 12 - that takes less than 1 minute on a Core i5 using Windows7 64bit and current R. For R's own more clever algorithm in C that's used in pwilcox(), you can check the source file src/nmath/wilcox.c

n1 <- 12                                # size group 1
n2 <- 12                                # size group 2
N  <- n1 + n2                           # total number of subjects

Now generate all possible cases for the ranks within group 1. These are all ${N \choose n_{1}}$ different samples from the numbers $1, \ldots, N$ of size $n_{1}$. Then calculate the rank sum (= test statistic) for each of these cases. Tabulate these rank sums to get the probability density function from the relative frequencies, the cumulative sum of these relative frequencies is the cumulative distribution function.

rankMat <- combn(1:N, n1)               # all possible ranks within group 1
LnPl    <- colSums(rankMat)             # all possible rank sums for group 1
dWRS    <- table(LnPl) / choose(N, n1)  # relative frequencies of rank sums: pdf
pWRS    <- cumsum(dWRS)                 # cumulative sums: cdf

Compare the exact distribution against the asymptotically correct normal distribution.

muLnPl  <- (n1    * (N+1)) /  2         # expected value
varLnPl <- (n1*n2 * (N+1)) / 12         # variance

plot(names(pWRS), pWRS, main="Wilcoxon RS, N=(12, 12): exact vs. asymptotic",
     type="n", xlab="ln+", ylab="P(Ln+ <= ln+)", cex.lab=1.4)
curve(pnorm(x, mean=muLnPl, sd=sqrt(varLnPl)), lwd=4, n=200, add=TRUE)
points(names(pWRS), pWRS, pch=16, col="red", cex=0.7)
abline(h=0.95, col="blue")
legend(x="bottomright", legend=c("exact", "asymptotic"),
       pch=c(16, NA), col=c("red", "black"), lty=c(NA, 1), lwd=c(NA, 2))

enter image description here

Related Question