Solved – Understanding the Wilcoxon rank-sum one-sided test

hypothesis testingrwilcoxon-mann-whitney-test

With a simple two sided test, the null hypothesis is often set as:

Hn: µ1- µ2 = 0

and if we get a p-value less than 0.05, we can reject the null hypothesis and accept the alternative. That is:

Ha: µ1 – µ2 != 0

But if we expect a difference, we can create a one-side test. Taking the example from here, we run a one-sided Wilcoxon rank-sum test test on the data.

set.seed(123)
Nj  <- c(20, 30)
DVa <- rnorm(Nj[1], mean= 95, sd=15)
DVb <- rnorm(Nj[2], mean=100, sd=15)
wIndDf <- data.frame(DV=c(DVa, DVb),
                     IV=factor(rep(1:2, Nj), labels=LETTERS[1:2]))

Not included on the original website, we can visualise the data

boxplot(DV ~ IV, data = wIndDf)

We also assess the levels of the data

levels(wIndDf$IV)

And we see that A will be compared to B. To run a one-side test, the code is as (direct from the website)

wilcox.test(DV ~ IV, alternative="less", conf.int=TRUE, data=wIndDf)

And we get a p-value < 0.05, so we can reject the null. But what was the null, and the alternative

From the alternative command, am I correct in saying the null is:

µA < µB

Best Answer

The IV variable is a factor with two ordered levels, A and B. The A is the referent level and the B is the comparitor. The difference in location is of a A - B form. It's easy to check this by giving B a much larger mean and testing the "greater" and "less" commands. The presentation of the estimate and its CI is also the same as a T-test. A more interesting question is what exactly is lesser or greater with the Wilcoxon. The Wilcoxon has been questioned extensively as far as interpretation and credibility of results. It is not a median nor is it a mean unless strong distributional assumptions are met.

Related Solutions

R – Two-Sample One-Sided Kolmogorov-Smirnov Test vs. One-Sided Wilcoxon-Mann-Whitney Test

Both are testing for displacement of the x variable with respect to the y variable, but the 2 tests have opposite meanings for the term "greater" (and therefor also or "less").

In the ks.test "greater" means that the CDF of 'x' is higher than the CDF of 'y' which means that things like the mean and the median will be smaller values in 'x' than in 'y' if the CDF of 'x' is "greater" than the CDF of 'y'. In 'wicox.test' and 't.test' the mean, median, etc. will be greater in 'x' than in 'y' if you believe that the alternative of "greater" is true.

An example from R:

> x <- rnorm(25)
> y <- rnorm(25, 1)
> 
> ks.test(x,y, alt='greater')

        Two-sample Kolmogorov-Smirnov test

data:  x and y 
D = 0.6, p-value = 0.0001625
alternative hypothesis: two-sided 

> wilcox.test( x, y, alt='greater' )

        Wilcoxon rank sum test

data:  x and y 
W = 127, p-value = 0.9999
alternative hypothesis: true location shift is greater than 0 

> wilcox.test( x, y, alt='less' )

        Wilcoxon rank sum test

data:  x and y 
W = 127, p-value = 0.000101
alternative hypothesis: true location shift is less than 0

Here I generated 2 samples from a normal distribution, both with sample size 25 and standard deviation of 1. The x variable comes from a distribution of mean 0 and the y variable from a distribution of mean 1. You can see the results of ks.test give a very significant result testing in the "greater" direction even though x has the smaller mean, this is because the CDF of x is above that of y. The wilcox.test function shows lack of significance in the "greater" direction, but similar level of significance in the "less" direction.

Both tests are different approaches to testing the same idea, but what "greater" and "less" mean to the 2 tests are different (and conceptually opposite).

Hypothesis Testing – Methodology for Wilcoxon Rank Sum Test

I have selected this nonparametric test because it makes no assumption about data distribution.

This is not quite the case; it makes some assumptions (such as continuity), it just doesn't assume a specific functional form.

is The Wilcoxon rank sum test based on means (as I think in steps 3 and 4) or is based on medians?

Neither. It's the median of pairwise differences (two sample Hodges-Lehmann difference) - that we're dealing with.

See this post for some discussion on that point (near the top of the post).

As whuber quite rightly points out below, under the location-shift alternative, it's a difference in means or medians as much as it is a median of pairwise differences.

See this post for a discussion of both the location-shift alternative and the more general alternative that the Wilcoxon-Mann-Whitney is sensitive to; there's some more discussion at the end of the post here

Is it possible that A can be better than B and at the same time $E(X)<E(Y)$?

Certainly, if by 'better' you mean "has a high median pairwise difference".

Note that your density displays show a roughly similar asymmetric shape but quite different spread; that's one way (of a number of ways) you might see it. Different shapes but similar spread can also produce it. If there's only a shift in location, the difference in population means and population median-pairwise-difference will be the same - but even with a pure location-shift in the populations, the samples might show opposite shifts.

Is the following methodology correct?

As expressed I don't understand it. For example, the comparison "if x==y" doesn't make sense - why would the samples be identical, and if they were, what would be the point in proceeding, since no test can find a difference?

If I wish to test A against several algorithms B, C, ..., what could be the best approach to take?

What would be best depends on many things which I don't have the information to answer (if you want a nonparametric test I'd suggest considering permutation tests with good power against whatever alternative is of primary interest). The $k$-sample equivalent of the Wilcoxon-Mann-Whitney would be the Kruskal-Wallis test, so if you're happy with the WMW, you might consider the KW.

Best Answer

Related Solutions

R – Two-Sample One-Sided Kolmogorov-Smirnov Test vs. One-Sided Wilcoxon-Mann-Whitney Test

Hypothesis Testing – Methodology for Wilcoxon Rank Sum Test

Related Question