Wilcoxon Mann Whitney Test – Criteria for Data Shape in Mann Whitney U Test

wilcoxon-mann-whitney-test

As far as I know if the assumption for Mann Whitney for similar distribution of shape is not met the test results represent the test of the mean rank instead of median. What does this mean for the result??
I am analyzing the difference between scores of two independent sample groups (6 in Experiment Group and 5 in Control group) for Intrinsic Motivation Inventory score. I am testing motivation after using a fitness application. Experiment Group using my application and control group using other application. All subjects are university students ( 8 males , 3 Females).

Best Answer

If two samples have roughly the same shape, then the Mann-Whitney-Wilcoxon test (a rank sum test) can be considered a test whether the locations (often expressed as medians) differ. Consider fictitious data sampled in R.

set.seed(104)
y1 = rgamma(100, 3, 1/3)
y2 = rgamma(100, 3, 1/3) + 4
median(y1);  median(y2)
[1] 7.684493
[1] 11.85169

stripchart(list(y1,y2), ylim=c(.5,2.5), pch="|")

Because the P-value of the Wilcoxon rank sum test is near $0.$ we can say that sample medians $7.68$ and $11.95$ are significantly different at the 1% level.

wilcox.test(y1, y2)

        Wilcoxon rank sum test with continuity correction

data:  y1 and y2
W = 2634, p-value = 7.477e-09
alternative hypothesis: 
 true location shift is not equal to 0

However, if two samples have distinctly different shapes, rejection of the null hypothesis of the Wilcoxon rank sum test should be interpreted to mean that the population distribution of one sample 'stochastically dominates' the population distribution of the other.

set.seed(2022)
x1 = rgamma(100, 3, 1/3)
x2 = rnorm(100, 12, 3)

summary(x1)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.868   5.643   7.962   8.411  11.274  19.239 
summary(x2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  5.654   9.625  11.456  11.756  13.877  19.377 


stripchart(list(x1,x2), ylim=c(.5,2.5), pch="|")

wilcox.test(x1,x2)

        Wilcoxon rank sum test with continuity correction

data:  x1 and x2
W = 2531, p-value = 1.624e-09
alternative hypothesis: 
 true location shift is not equal to 0

Stochastic domination means that values in the second population tend to be larger than values in the first. Perhaps this is best illustrated by showing the empirical CDFs (ECDFs) of the two samples. The dominating ECDF (blue in the figure below) plots to the right of the other ECDF, and thus below.

hdr  = "ECDFs of x1 and x2 (blue)"
plot(ecdf(x1), xlim=c(0, 20), main=hdr)
lines(ecdf(x2), col="blue")

Related Solutions

Solved – Best type of graph to represent data tested with the Mann-Whitney U-test

Box plots would be much more informative since they provide distributional information in addition to medians. This is particularly important when you use the Mann-Whitney U since the null hypothesis tested is somewhat vague and it is important for readers to have some idea how the distributions differ. If you only want to give the medians then a graph is not a good idea since its data density is so low.

Solved – Have I presented this Mann-Whitney U test appropriately

As for the check with SPSS or R, suitable R code could be the following. Unfortunately I can only tell you a way via Wilcoxon W, not Mann-Whitney U. The tests are equivalent, though:

library(exactRankTests)
f <- c(rep(1,21), rep(2,17), rep(3, 82), rep(4,34), rep(5,18))
m <- c(rep(1,7), rep(2,15), rep(3,28), rep(4,13), rep(5,8))
wilcox.exact(f, m)

The result would be

> wilcox.exact(f, m)

    Asymptotic Wilcoxon rank sum test

data:  f and m
W = 6399, p-value = 0.5343
alternative hypothesis: true mu is not equal to 0

Where you could cite R in the literature as

R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

and the package exactRankTests as

Torsten Hothorn and Kurt Hornik (2019). exactRankTests: Exact Distributions for Rank and Permutation Tests. R package version 0.8-31. https://CRAN.R-project.org/package=exactRankTests

As for the rest of the description, that depends a lot on personal taste, faculty etc. I for one would be careful to call something that has been measured by only one Likert-type item as a Likert scale. Also you seem to use Likert scale data and Likert score somewhat identical. Why two different words then? Apparently, you have interviewed 243 persons. Does it seem appropriate to use that many digits for standard deviation and p-value?

So the calculation is about right, detail in the wording has to do with personal taste.

Best Answer

Related Solutions

Solved – Best type of graph to represent data tested with the Mann-Whitney U-test

Solved – Have I presented this Mann-Whitney U test appropriately

Related Question