Solved – How to interpret the Mann-Whitney U when using R’s formula interface

nonparametricrwilcoxon-mann-whitney-test

Say we have the following data:

set.seed(123)
data <- data.frame(x = c(rnorm(50, 1, 1), rnorm(50, 5, 2)),
                   y = c(rep('A', 50),    rep('B', 50)))

Which yields the following boxplot (boxplot(data$x ~ data$y)):

boxplot

Now let's say I want to test if the two samples have the same location parameters (median and/or mean). In my real case, the data are clearly not normal, so I've decided to run the Wilcoxon-Mann-Whitney test, like this:

wilcox.test(data$x ~ data$y)

However, I would like the alternative hypothesis to be that B, data$y's "second" factor, comes from a distribution with higher position parameters. I've tried setting the alternative parameter to "greater" and "less", but apparently the alternative hypotheses are not what I'm looking for. For example, alternative = "greater" tells me "alternative hypothesis: true location shift is greater than 0"; alternative = "less" tells me "alternative hypothesis: true location shift is less than 0".

How can I tweak the wilcox.test() function in order to have the alternative hypothesis I want (B comes from a distribution with higher position parameters than A)? Or should I just use another test instead?

Best Answer

Technically, the reference category and the direction of the test depend on the way the factor variable is encoded. With your toy data:

> wilcox.test(x ~ y, data=data, alternative="greater")

    Wilcoxon rank sum test with continuity correction

data:  x by y 
W = 52, p-value = 1
alternative hypothesis: true location shift is greater than 0 

> wilcox.test(x ~ y, data=data, alternative="less")

    Wilcoxon rank sum test with continuity correction

data:  x by y 
W = 52, p-value < 2.2e-16
alternative hypothesis: true location shift is less than 0

Notice that the W statistic is the same in both cases but the test uses opposite tails of its sampling distribution. Now let's look at the factor variable:

> levels(data$y)
[1] "A" "B"

We can recode it to make "B" the first level:

> data$y <- factor(data$y, levels=c("B", "A"))

Now we have:

> levels(data$y)
[1] "B" "A"

Note that we did not change the data themselves, just the way the categorical variable is encoded “under the hood”:

> head(data)
          x y
1 0.4395244 A
2 0.7698225 A
3 2.5587083 A
4 1.0705084 A
5 1.1292877 A
6 2.7150650 A

> aggregate(data$x, by=list(data$y), mean)
  Group.1        x
1       B 5.292817
2       A 1.034404

But the directions of the test are now inverted:

> wilcox.test(x ~ y, data=data, alternative="greater")

    Wilcoxon rank sum test with continuity correction

data:  x by y 
W = 2448, p-value < 2.2e-16
alternative hypothesis: true location shift is greater than 0

The W statistic is different but the p-value is the same than for the alternative="less" test with the categories in the original order. With the original data, it could be interpreted as “the location shift from B to A is less than 0” and with the recoded data it becomes “the location shift from A to B is greater than 0” but this is really the same hypothesis (but see Glen_b's comments to the question for the correct interpretation).

In your case, it therefore seems that the test you want is alternative="less" (or, equivalently, alternative="greater" with the recoded data). Does that help?

Related Solutions

R – Two-Sample One-Sided Kolmogorov-Smirnov Test vs. One-Sided Wilcoxon-Mann-Whitney Test

Both are testing for displacement of the x variable with respect to the y variable, but the 2 tests have opposite meanings for the term "greater" (and therefor also or "less").

In the ks.test "greater" means that the CDF of 'x' is higher than the CDF of 'y' which means that things like the mean and the median will be smaller values in 'x' than in 'y' if the CDF of 'x' is "greater" than the CDF of 'y'. In 'wicox.test' and 't.test' the mean, median, etc. will be greater in 'x' than in 'y' if you believe that the alternative of "greater" is true.

An example from R:

> x <- rnorm(25)
> y <- rnorm(25, 1)
> 
> ks.test(x,y, alt='greater')

        Two-sample Kolmogorov-Smirnov test

data:  x and y 
D = 0.6, p-value = 0.0001625
alternative hypothesis: two-sided 

> wilcox.test( x, y, alt='greater' )

        Wilcoxon rank sum test

data:  x and y 
W = 127, p-value = 0.9999
alternative hypothesis: true location shift is greater than 0 

> wilcox.test( x, y, alt='less' )

        Wilcoxon rank sum test

data:  x and y 
W = 127, p-value = 0.000101
alternative hypothesis: true location shift is less than 0

Here I generated 2 samples from a normal distribution, both with sample size 25 and standard deviation of 1. The x variable comes from a distribution of mean 0 and the y variable from a distribution of mean 1. You can see the results of ks.test give a very significant result testing in the "greater" direction even though x has the smaller mean, this is because the CDF of x is above that of y. The wilcox.test function shows lack of significance in the "greater" direction, but similar level of significance in the "less" direction.

Both tests are different approaches to testing the same idea, but what "greater" and "less" mean to the 2 tests are different (and conceptually opposite).

Wilcoxon Mann-Whitney Test – How to Determine Which Group Has Higher Median in Significant Two-Sided Wilcoxon Rank Sum Test?

Since I know that the Wilcoxon test compares pseudo-medians

Not quite. The Wilcoxon signed rank test compares the one-sample Hodges-Lehmann statistic (median-of-within-sample-pairwise-averages, equivalently median of Walsh averages, or pseudomedian) to 0. But the rank-sum test compares the two-sample Hodges-Lehmann statistic (the median of between-sample pairwise differences as described in the second paragraph under "Definition" at the link) to zero -- it does not compare two one-sample pseudomedians.

Based on my descriptive graphs, i.e. boxplots, jitter plots, it is not immediately visible which of the two groups I am comparing has the higher/lower median/mean.

You seem to be assuming that the difference-in-mean and the difference-in-median will behave like the median-pairwise-difference (in the sense that if one is different the other two will be and in the same direction).

This will often be true but it is not necessarily the case.

A population (or indeed, a sample) can have any of the three be different in some given direction while one or both the others are not different or even arranged in the opposite direction.

I am not sure which way would be appropriate to answer the question which of the two groups of which I know are significantly different w.r.t. the outcome variable, is is greater than the other.

If you're asking "which group did the rank sum test think had higher location?" (i.e. what difference caused the rejection?), then compute the median pairwise difference. This is a simple calculation and most decent stats packages will offer the calculation (at least as an option relating to a confidence interval for the difference) with the rank sum test.

If you're asking "in what direction do the medians differ" or "in what direction do the means differ" then you won't necessarily have an answer consistent with the rank sum test -- if you care about one of those, test that instead (perhaps with a permutation test based on those particular statistics).

If you assume that the distributions are the same up to a possible location shift under the alternative, and if you assume population means exist (generally a quite reasonable assumption) then you've already made the necessary assumption to attribute the direction of difference to the direction the rank sum test looked at.

Conduct a one-sided test and see which is significant.

You could, if you do it at half the significance level, but it would seem to be a fairly involved way to go about finding what can be obtained via a simple sample calculation. If you don't have a convenient way to do the calculation otherwise, it should work just fine.

Look at the means/medians within the two groups and whichever is greater is the one which is significantly greater.

You could find mean and median both have group 1 greater but the rank sum rejected in the other direction. Or you could have mean greater in one direction and median greater in the other direction. Or you could have both group-means and both group-medians be equal even though the rank-sum test rejected.

So this would not be a generally good choice.

Here's an example in R. First, adding the confidence interval calculation gets the location difference estimate, then calculating it directly from the sample. This assumes there's already data (in x and y):

> wilcox.test(x,y,conf.int=TRUE,conf.level=0.9)

        Wilcoxon rank sum test

data:  x and y
W = 9, p-value = 0.05927
alternative hypothesis: true location shift is not equal to 0
90 percent confidence interval:
 -15.3239889  -0.5774458
sample estimates:
difference in location 
             -8.891949 

> median(outer(x,y,"-")) # calculate median of pairwise differences
[1] -8.891949

So this tells us that y tends to be larger (as measured by the rank-sum statistic) than x (since the x-y differences tend to be negative)

An aside on this bit:

This let's me confidently reject the null hypothesis of no true location shift at a confidence level < 0.1

There's a few thing wrong there. That doesn't really let us "confidently" do anything, and 0.1 would be your significance level, not a confidence level. If I wanted to speak with some sort of confidence about an effect, I'd tend to be looking at effect sizes, confidence intervals and I'd at least want some (before-seeing-the-data) sense of the power against an anticipated/useful effect-size.

Best Answer

Related Solutions

R – Two-Sample One-Sided Kolmogorov-Smirnov Test vs. One-Sided Wilcoxon-Mann-Whitney Test

Wilcoxon Mann-Whitney Test – How to Determine Which Group Has Higher Median in Significant Two-Sided Wilcoxon Rank Sum Test?

Related Question