Solved – Error in scipy Wilcoxon signed-rank test

errorscipywilcoxon-signed-rank

I was digging in the scipy code for this test Wilcoxon signed-rank test (stats.Wilcoxon) and I found that in scipy they compute the sum of the ranks for the differences that are positive and separately for the ones that are negative. Then they picked the smaller one and use that as W. That is substantially different from the test explanation in Wikipedia, or other sites where W = sum(all_differences).

Is this approach valid?

Best Answer

First, wilcoxon test in scipy.stats does NOT use $W$ as the test statics, it instead uses $T$ as defined in Siegel's popular book: Non-parametric statistics for the behavioral sciences. And yes, as @whuber correctly pointed out, once you know $T$ and sample size, $W$ is also defined (@whuber, strictly speaking, not quite, one also need to know how 0 differences are handled).

Only can only know how the test is implemented by reading the source code. For scipy, Wilcoxon test can be found in your_python_package_folde/scipy/stats/morestats.py. Compare to R's wilcox.test, it is very simple. Go over the code, and you will see that it is equivalent to having correct=FALSE, exact=FALSE, paired=TRUE flags on in R.

Python:

>>> from scipy import stats
>>> x1=[48,  7, 12, 11, 62, 93, 79, 53, 28, 49, 74, 59, 57, 62, 22,  8, 30, 11,  2, 47]
>>> x2=[20, 13, 41, 61, 93, 11, 28, 61, 26, 91, 95,  5, 80, 45, 88, 99, 50, 96, 69, 93]
>>> stats.wilcoxon(x1, x2) # T and p value, two-sided
(60.0, 0.092963126712486244)

in R:

> x1<-c(48,  7, 12, 11, 62, 93, 79, 53, 28, 49, 74, 59, 57, 62, 22,  8, 30, 11,  2, 47)
> x2<-c(20, 13, 41, 61, 93, 11, 28, 61, 26, 91, 95,  5, 80, 45, 88, 99, 50, 96, 69, 93)
> wilcox.test(x1,x2,correct=FALSE,exact=FALSE,paired=TRUE)

    Wilcoxon signed rank test

data:  x1 and x2 
V = 60, p-value = 0.09296
alternative hypothesis: true location shift is not equal to 0 
Related Question