Mann-Whitney Rank Test – Interpretation of p-value in Mann-Whitney Rank Test

hypothesis testingscipywilcoxon-mann-whitney-test

I am testing Mann-Whitney rank test with two vectors a and b. Vectors are almost similar so I expect a p-value near 0, but the returned p-value is near 1. What is the reason? I read the manual and also run the code with different parameters, but don't get anything near to what I expect.

from scipy.stats import mannwhitneyu
import operator
import numpy as np

a = [1000,100,10,1,10,100,1000,10000,1000,100,10,1]
b = [999,100,10,1,10,100,1000,10000,1000,100,10,1]

print(mannwhitneyu(a, b))

The output of the code:

MannwhitneyuResult(statistic=73.5, pvalue=0.95291544831453867)

Edit:

Let's formalize the problem to eliminate any misunderstanding (please edit the question if something is wrong):

What I try to prove is that rank distribution of data are approximately equal,

Null hypothesis ($H_0$)= "ranked distribution of a and b are approximately equal"

Alternative hypothesis ($H_a$)= "ranked distribution of a and b are not equal"

significance level (alpha) = 0.05

p-value= 0.95

(p-value > alpha) so there is no sufficient evidence that $H_a$ is correct, but we cannot also conclude that ($H_0$) is true.

Instead if (p-value < alpha) was true, then we would have enough evidence against H0 and Ha can be accepted.

So what happened here is that I could not disprove the null hypothesis. However, it doesn't mean that null hypothesis is false. It is like an investigation to accuse MR X of being guilty:

$H_0$=" MR X guilty"
$H_a$=" MR X not guilty"

We guess he is not guilty, but we don't have enough proof against him: (p-value > alpha), but that doesn't mean that ($H_0$) would not be true. If we could obtain enough evidence and state (p-value < alpha), then we disprove $H_0$ and we can conclude that Ha is true and he is not guilty.

Best Answer

The p-value represents the probability of getting a test-statistic at least as extreme$^\dagger$ as the one you had in your sample, if the null hypothesis were true.

A high p-value indicates you saw something really consistent with the null hypothesis (e.g. tossing 151 heads in 300 tosses of a coin you're examining for fairness), and something that's really consistent with the null being true would not cause you to think it was false. (In some situations it might perhaps lead you to think more carefully about the assumptions.)

If you thought that a and b were very similar in values then you'd expect to obtain a high p-value, not a low one. (If you expected a low p-value, you may have some misunderstanding of how p-values work.)

That is, the p-value from the test is consistent with your statement about them being very similar*.

Low p-values are the things that would cause you to hold doubt about the null.


A caveat: the fact that the values move together in pairs across many orders of magnitude indicates a very high correlation, so the assumption of independence is untenable (the data seem to be paired). It should surely raise doubts about the suitability of the test. (I presume you made up the values to see how the test behaves, but if that's real data you have a problem.)


$\dagger$ away from what you'd expect to see under the null, in the direction of what you'd see under the alternative

* however note that values may in some circumstances be very similar (at least in some sense) without the p-value being high.

Related Question