Solved – Mann-Whitney null hypothesis under unequal variance

hypothesis testingvariancewilcoxon-mann-whitney-test

I'm just curious about the null hypothesis of a Mann-Whitney U test. I often see it stated that the null hypothesis is that two populations have equal distributions. But I'm thinking – if I had two normal populations with the same mean but extremely unequal variance, the Mann-Whitney test would probably not detect this difference.

I have also seen it stated that the null hypothesis of the Mann-Whitney test is $\Pr(X>Y)=0.5$ or the probability of an observation from one population ($X$) exceeding an observation from the second population ($Y$) (after exclusion of ties) is equal to 0.5. This seems to make a bit more sense but does not seem equivalent to the first null hypothesis I stated.

I'm hoping to get a bit of help untangling this. Thanks!

Best Answer

The Mann-Whitney test is a special case of a permutation test (the distribution under the null is derived by looking at all the possible permutations of the data) and permutation tests have the null as identical distributions, so that is technically correct.

One way of thinking of the Mann-Whitney test statistic is a measure of the number of times a randomly chosen value from one group exceeds a randomly chosen value from the other group. So the P(X>Y)=0.5 also makes sense and this is technically a property of the equal distributions null (assuming continuous distributions where the probability of a tie is 0). If the 2 distributions are the same then the probability of X being Greater than Y is 0.5 since they are both drawn from the same distribution.

The stated case of 2 distributions having the same mean but widely different variances matches with the 2nd null hypothesis, but not the 1st of identical distributions. We can do some simulation to see what happens with the p-values in this case (in theory they should be uniformly distributed):

> out <- replicate( 100000, wilcox.test( rnorm(25, 0, 2), rnorm(25,0,10) )$p.value )
> hist(out)
> mean(out < 0.05)
[1] 0.07991
> prop.test( sum(out<0.05), length(out), p=0.05 )

        1-sample proportions test with continuity correction

data:  sum(out < 0.05) out of length(out), null probability 0.05
X-squared = 1882.756, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.05
95 percent confidence interval:
 0.07824054 0.08161183
sample estimates:
      p 
0.07991 

So clearly this is rejecting more often than it should and the null hypothesis is false (this matches equality of distributions, but not prob=0.5).

Thinking in terms of probability of X > Y also runs into some interesting problems if you ever compare populations that are based on Efron's Dice.

Related Question