Solved – Mann-Whitney U test for sample sizes 65 and 10 in Python

pythonwilcoxon-mann-whitney-test

I would like to run the Mann-Whitney U for two groups with sizes 65 and 10, where I essentially want to compare scores on some measure between two different groups.

I have two questions regarding the mannwhitneyu function in python's scipy stats library:

  • First of all, the documentation says:

Use only when the number of observation in each sample is > 20 and you
have 2 independent samples of ranks. Mann-Whitney U is significant if
the u-obtained is LESS THAN or equal to the critical value of U.

So I assume I can't use it since the size of my second sample is < 20. The only alternative I know is where both sample sizes < 20, where we manually calculate $U$, and then compare the test statistic against a table of critical values. However, the only tables I can find are where both sample sizes are < 20.

So is there a version of Mann-Whitney U I can use for these sample sizes (preferably with implementation in python)? Or should I use an entirely different test, such as Mood's Median test?

My second question is a bit more general regarding the mannwhitneyu in scipy:

  • It seems to output the test statistic $U$ and a $p-$value. So if you had sample sizes greater than 20, and you wanted to determine whether this result is statistically significant, is the $p-value$ that is outputted, the one for normalised $U$? I.e. it is the $p-$value for $z = \frac{U-m_u}{\sigma_u}$ ? So can you just read off this $p-$value to determine the statistical significance of the difference between the two groups? If so, what is the point of the function outputting $U$? What would you do with that information?

Best Answer

The reason for this restriction is in order to use the normal approximation rather than producing exact p-values. Actually even with sample sizes of 10 and 65 the normal approximation isn't so bad out to about 2.5 sd's from the mean (out to 2-tailed p-values of about 0.01), but further out than that it starts to get pretty poor.

There's no reason why exact values can't be computed (R will calculate the entire distribution of the test statistic under the null at these sample sizes quite happily, for example).

Related Question