Solved – Getting a P value of 1 when medians/means are different (Wilcoxon rank sum test)

p-valuewilcoxon-mann-whitney-test

I have just performed a Wilcoxon rank sum test on a sample of 22 continuous data points (11 in each group) and am getting a P value of exactly 1 (not rounded).

I've read that this can only happen when the sample values are exactly equal, but mine aren't:

mean(diversity[habitat=="scrub"])
[1] 1.804455
median(diversity[habitat=="scrub"])
[1] 1.983
mean(diversity[habitat=="forest"])
[1] 1.819818
median(diversity[habitat=="forest"])
[1] 1.95

I really don't think I've done anything wrong/differently to normal, so I'm inclined to accept the value. Can anyone explain why I am getting this value? Is it just because the two samples are SO similar? Or something to do with the continuity correction?

Thanks.

Best Answer

Note that it's quite possible for two continuous distributions to yield rank-sums equal to their expected value under the null, or rank sums that differ by the smallest possible amount (1, as in this case). In either case all other arrangements would be "at least as extreme" in the two-tailed test, so the p-value would be 1.

Which is to say, you can quite easily get the p-value being exactly 1 without any of the values being the same as any other values.

For example, imagine we have the following 22 (combined & sorted) sample values:

   1.961  4.160  6.561  6.633  7.454  7.958  8.200  8.488  8.635  8.698
   8.881  9.099 10.086 11.178 11.711 11.926 12.546 13.026 13.242 14.025
  14.822 17.167

Then if (for example) the two groups of 11 had the following items from that list:

   g1:  2  3  6  7 10 11 14 15 18 19 22
   g2:  1  4  5  8  9 12 13 16 17 20 21

(i.e. these now represent the ranks).

Which is to say the two groups have the following data:

y1: 4.160 6.561 7.958 8.200 8.698 8.881 11.178 11.711 13.026 13.242 17.167
y2: 1.961 6.633 7.454 8.488 8.635 9.099 10.086 11.926 12.546 14.025 14.822

Then the sum of ranks in the two groups differ only by 1 (and without ties it's not possible for them to differ by less), and the p-value must then be exactly 1:

 wilcox.test(y1,y2)

        Wilcoxon rank sum test

data:  y1 and y2
W = 61, p-value = 1
alternative hypothesis: true location shift is not equal to 0

Yet both the means and medians are different.

[There are many ways to split the values 1,2,...,22 up into two sets of 11 so that the sum of each set is either 126 or 127 -- i.e. 253/2 rounded up or down; this particular one just happened to be easy to obtain.]

Note that the Wilcoxon rank sum test is not a test of means nor a test of medians, and both may differ while the test sees the two samples as not different. Alternatively, you could be in a situations where you have both the means being the same, or both the medians being the same (even both means and medians equal across samples at the same time) while at the same time the Wilcoxon rank sum rejects the null (because it doesn't consider either of them).

(I'd regard the advice in comments of "try a t-test" to amount to p-hacking. I see no reason whatever to abandon the test you did.)

Related Question