R – Performing the Wilcoxon Rank Sum Test

rwilcoxon-mann-whitney-testwilcoxon-signed-rank

I have results from the same test applied to two independent samples:

x <- c(17, 12, 13, 16, 9, 19, 21, 12, 18, 17)
y <- c(10, 6, 15, 9, 8, 11, 8, 16, 13, 7, 5, 14)

And I want to compute a Wilcoxon rank sum test.

When I calculate the statistic $T_{W}$ by hand, I get:
$$
T_{W}=\sum\text{rank}(X_{i}) = 156.5
$$

When I let R perform a wilcox.test(x, y, correct = F), I get:

W = 101.5

Why is that? Shouldn't the statistic $W^{+}$ only be returned when I perform a signed rank test with paired = T? Or do I misunderstand the rank sum test?

How can I tell R to output $T_{W}$


As part of the test results, not through something like:

dat <- data.frame(v = c(x, y), s = factor(rep(c("x", "y"), c(10, 12))))
dat$r <- rank(dat$v)
T.W <- sum(dat$r[dat$s == "x"])

I asked a follow up question about the meaning of the Different ways to calculate the test statistic for the Wilcoxon rank sum test

Best Answer

The Note in the help on the wilcox.test function clearly explains why R's value is smaller than yours:

Note

The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value subtracted or not: R subtracts and S-PLUS does not, giving a value which is larger by m(m+1)/2 for a first sample of size m. (It seems Wilcoxon's original paper used the unadjusted sum of the ranks but subsequent tables subtracted the minimum.)

That is, the definition R uses is $n_1(n_1+1)/2$ smaller than the version you use, where $n_1$ is the number of observations in the first sample.

As for modifying the result, you could assign the output from wilcox.test into a variable, say a, and then manipulate a$statistic - adding the minimum to its value and changing its name. Then when you print a (e.g. by typing a), it will look the way you want.

To see what I am getting at, try this:

a <- wilcox.test(x,y,correct=FALSE)
str(a) 

So for example if you do this:

n1 <- length(x)
a$statistic <- a$statistic + n1*(n1+1)/2
names(a$statistic) <- "T.W"
a

then you get:

        Wilcoxon rank sum test with continuity correction

data:  x and y 
T.W = 156.5, p-value = 0.006768
alternative hypothesis: true location shift is not equal to 0 

It's quite common to refer to the rank sum test (whether shifted by $n_1(n_1+1)/2$ or not) as either $W$ or $w$ or some close variant (e.g. here or here). It also often gets called '$U$' because of Mann & Whitney. There's plenty of precedent for using $W$, so for myself I wouldn't bother with the line that changes the name of the statistic, but if it suits you to do so there's no reason why you shouldn't, either.