The Kruskal-Wallis test could also be used, as it's a non-parametric ANOVA. Additionally, it is often considered to be more powerful than Mood's median test. It can be implemented in R using the kruskal.test function in the stats package in R.
To respond to your edit, interpreting K-W is similar to a one-way ANOVA. A significant p-value corresponds to rejected the null that all three means are equal. You must use a follow-up test (again, just like an ANOVA), to answer questions about specific groups. This typically follows specific research questions you may have. Just by looking at the parameters of the simulation, all three groups should be significantly different from one another if you do a follow-up test (as they're all 1 SD apart with N = 100).
Note that it's quite possible for two continuous distributions to yield rank-sums equal to their expected value under the null, or rank sums that differ by the smallest possible amount (1, as in this case). In either case all other arrangements would be "at least as extreme" in the two-tailed test, so the p-value would be 1.
Which is to say, you can quite easily get the p-value being exactly 1 without any of the values being the same as any other values.
For example, imagine we have the following 22 (combined & sorted) sample values:
1.961 4.160 6.561 6.633 7.454 7.958 8.200 8.488 8.635 8.698
8.881 9.099 10.086 11.178 11.711 11.926 12.546 13.026 13.242 14.025
14.822 17.167
Then if (for example) the two groups of 11 had the following items from that list:
g1: 2 3 6 7 10 11 14 15 18 19 22
g2: 1 4 5 8 9 12 13 16 17 20 21
(i.e. these now represent the ranks).
Which is to say the two groups have the following data:
y1: 4.160 6.561 7.958 8.200 8.698 8.881 11.178 11.711 13.026 13.242 17.167
y2: 1.961 6.633 7.454 8.488 8.635 9.099 10.086 11.926 12.546 14.025 14.822
Then the sum of ranks in the two groups differ only by 1 (and without ties it's not possible for them to differ by less), and the p-value must then be exactly 1:
wilcox.test(y1,y2)
Wilcoxon rank sum test
data: y1 and y2
W = 61, p-value = 1
alternative hypothesis: true location shift is not equal to 0
Yet both the means and medians are different.
[There are many ways to split the values 1,2,...,22 up into two sets of 11 so that the sum of each set is either 126 or 127 -- i.e. 253/2 rounded up or down; this particular one just happened to be easy to obtain.]
Note that the Wilcoxon rank sum test is not a test of means nor a test of medians, and both may differ while the test sees the two samples as not different. Alternatively, you could be in a situations where you have both the means being the same, or both the medians being the same (even both means and medians equal across samples at the same time) while at the same time the Wilcoxon rank sum rejects the null (because it doesn't consider either of them).
(I'd regard the advice in comments of "try a t-test" to amount to p-hacking. I see no reason whatever to abandon the test you did.)
Best Answer
Not quite. The Wilcoxon signed rank test compares the one-sample Hodges-Lehmann statistic (median-of-within-sample-pairwise-averages, equivalently median of Walsh averages, or pseudomedian) to 0. But the rank-sum test compares the two-sample Hodges-Lehmann statistic (the median of between-sample pairwise differences as described in the second paragraph under "Definition" at the link) to zero -- it does not compare two one-sample pseudomedians.
You seem to be assuming that the difference-in-mean and the difference-in-median will behave like the median-pairwise-difference (in the sense that if one is different the other two will be and in the same direction).
This will often be true but it is not necessarily the case.
A population (or indeed, a sample) can have any of the three be different in some given direction while one or both the others are not different or even arranged in the opposite direction.
If you're asking "which group did the rank sum test think had higher location?" (i.e. what difference caused the rejection?), then compute the median pairwise difference. This is a simple calculation and most decent stats packages will offer the calculation (at least as an option relating to a confidence interval for the difference) with the rank sum test.
If you're asking "in what direction do the medians differ" or "in what direction do the means differ" then you won't necessarily have an answer consistent with the rank sum test -- if you care about one of those, test that instead (perhaps with a permutation test based on those particular statistics).
If you assume that the distributions are the same up to a possible location shift under the alternative, and if you assume population means exist (generally a quite reasonable assumption) then you've already made the necessary assumption to attribute the direction of difference to the direction the rank sum test looked at.
You could, if you do it at half the significance level, but it would seem to be a fairly involved way to go about finding what can be obtained via a simple sample calculation. If you don't have a convenient way to do the calculation otherwise, it should work just fine.
You could find mean and median both have group 1 greater but the rank sum rejected in the other direction. Or you could have mean greater in one direction and median greater in the other direction. Or you could have both group-means and both group-medians be equal even though the rank-sum test rejected.
So this would not be a generally good choice.
Here's an example in R. First, adding the confidence interval calculation gets the location difference estimate, then calculating it directly from the sample. This assumes there's already data (in
x
andy
):So this tells us that y tends to be larger (as measured by the rank-sum statistic) than x (since the x-y differences tend to be negative)
An aside on this bit:
There's a few thing wrong there. That doesn't really let us "confidently" do anything, and 0.1 would be your significance level, not a confidence level. If I wanted to speak with some sort of confidence about an effect, I'd tend to be looking at effect sizes, confidence intervals and I'd at least want some (before-seeing-the-data) sense of the power against an anticipated/useful effect-size.