Solved – Mann-Whitney U test and K-S test with unequal sample sizes

kolmogorov-smirnov testsample-sizewilcoxon-mann-whitney-test

I want to compare two distributions, to see if they are significantly different. They represent task time completions (so they range from 1 to around 1000 seconds) in two different months. They are not normally distributed. I want to see if their central tendencies are significantly different (at a first glance the mode, mean and median between the two months seem very close, just 3-4 seconds difference), but also to see if their shapes are similar (again, at a first glance, they look similar).
I am currently carrying this analysis with SPSS 20.
I have the Mann-Whitney test for testing central tendencies and the Kolmogorov-Smirnov test for the shape of the distribution, (although I have read that the K-S test is an overall comparison test for the distributions).

Also, in the first month I have 300,000 observations and in the second month 122,000 observations. So, a lot of data … but disproportionate. Is this an impediment to running these tests, the fact that the sample sizes are not equal? I ran both Mann-Whitney and K-S and they both seem to reject the null. How much should I trust the results given my sample sizes? Do you suggest any alternative tests?
Thanks

Best Answer

With such large sample sizes both tests will have high power to detect minor differences. The 2 distributions could be almost identical with a small difference in shape location that is not of practical importance and the tests would reject (because they are different).

If all you really care about is a statistically significant difference then you can be happy with the results of the KS test (and others, even a t-test will be meaningful with non-normal data of those sample sizes due to the Central Limit Theorem).

If you care about practical or meaningful differences then things become subjective, but you can compare using various plots to help you decide if you think there are differences that are enough to care about.

Another possibility is doing a visual test as documented in

 Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne,
 D.F and Wickham, H. (2009) Statistical Inference for exploratory
 data analysis and model diagnostics Phil. Trans. R. Soc. A 2009
 367, 4361-4383 doi: 10.1098/rsta.2009.0120

The vis.test function in the TeachingDemos package for R helps implement the test, but it can be done by hand as well.

Basically you create a bunch of graphs and then see if you can tell which is which. For your question one possibility would be to create a histogram of the 122,000 observations from the one month, then take several samples of 122,000 from the 300,000 observations of the other month and create histograms of each of those samples. Then present someone (or several someones) with all the histograms in random order and see if they can pick out the one that represents the second month. If they consistently pick out the correct graph then that says there is something visually different and you can further explore how they differ. If they don't pick out the correct graph then that suggests that while there may be a statistally significant difference, it is not important enough to distinguish them visually.