I have results from the same test applied to two independent samples:
x <- c(17, 12, 13, 16, 9, 19, 21, 12, 18, 17)
y <- c(10, 6, 15, 9, 8, 11, 8, 16, 13, 7, 5, 14)
And I want to compute a Wilcoxon rank sum test.
When I calculate the statistic $T_{W}$ by hand, I get:
$$
T_{W}=\sum\text{rank}(X_{i}) = 156.5
$$
When I let R perform a wilcox.test(x, y, correct = F)
, I get:
W = 101.5
Why is that? Shouldn't the statistic $W^{+}$ only be returned when I perform a signed rank test with paired = T
? Or do I misunderstand the rank sum test?
How can I tell R to output $T_{W}$
As part of the test results, not through something like:
dat <- data.frame(v = c(x, y), s = factor(rep(c("x", "y"), c(10, 12))))
dat$r <- rank(dat$v)
T.W <- sum(dat$r[dat$s == "x"])
I asked a follow up question about the meaning of the Different ways to calculate the test statistic for the Wilcoxon rank sum test
Best Answer
The
Note
in the help on thewilcox.test
function clearly explains why R's value is smaller than yours:That is, the definition R uses is $n_1(n_1+1)/2$ smaller than the version you use, where $n_1$ is the number of observations in the first sample.
As for modifying the result, you could assign the output from
wilcox.test
into a variable, saya
, and then manipulatea$statistic
- adding the minimum to its value and changing its name. Then when you printa
(e.g. by typinga
), it will look the way you want.To see what I am getting at, try this:
So for example if you do this:
then you get:
It's quite common to refer to the rank sum test (whether shifted by $n_1(n_1+1)/2$ or not) as either $W$ or $w$ or some close variant (e.g. here or here). It also often gets called '$U$' because of Mann & Whitney. There's plenty of precedent for using $W$, so for myself I wouldn't bother with the line that changes the name of the statistic, but if it suits you to do so there's no reason why you shouldn't, either.