Solved – Assumptions of Mann-Whitney U-test

boxplotlikertwilcoxon-mann-whitney-test

I have 2 samples, which I got by conducting a 5-point Likert scale survey. I have 9 results for group A and 17 results for group B for 12 variables. I would like to test now if the groups differ significantly in one or more of these variables and chose to employ the t-test. Assumptions for the t-test are 1. Interval scaled 2. Independent samples 3. Equal variances of smaples 4. Normally distributed.

My data is independent, and I have tested variance homogeneity with Levene’s test – nothing was significant – 0-hypothesis accepted – equal variances. I used the Shapiro-Wilk-test for the normality distribution and like expected when working with Likert the variables were – with a few exceptions – not normally distributed. I conducted a t-test nevertheless and came to, in my opinion, reasonable results. However, since there is no consensus whether to treat Likert as ordinal or metric and since the data is not normally distributed, I also conducted a Mann-Whitney-U-Test (MWU-Test). This one gave me almost the same results as the t-test (for the t-test 3 variables were significant and for MWU-test 2 of these 3 variables were significant). Now I read under assumptions for MWU-test that the population (the sample…) should have the same shape. I don’t understand that assumption. I have conducted boxplots (working with SPSS) but many of the plots do not have the same form.

My questions:

  1. How can I test the “shape” assumption and what are the implications if my samples do not fulfill this assumption? Is there a high risk of false results? I’m not sure since the results of the t-test and the MWU-test are so similar.
  2. Which influence has the size of my samples on the tests (variances, t-test, MWU-test), because n<10 and n<20?
  3. Would you go on working with the t-test results (3 significant variables) or the MWU-test (2 significant variables)?
  4. After I have concluded that 2 or 3 of the variables are significant, how can I test which group is the “better” one, e.g. crossed mostly “increased” instead of “decreased”?

Thank you so much already, I just started dealing with stats and SPSS, so every help is very welcome.

Best Answer

Consider the Gaussian (normal) cumulative distribution function $\Phi(y)$ and the empirical cumulative distribution function $F_{n}(y)$. For optimality (control of type I error and low type II error) the two-sample $t$-test assumes that $\Phi^{-1}(F_{n}(y))$ when stratified by group yields two straight parallel lines. For optimal power the Wilcoxon test (and the proportional odds ordinal logistic model) assume that $\textrm{logit}(F_{n}(y))$ yields two parallel curves (they needn't be straight lines). The Wilcoxon assumptions are less stringent than the parametric method's assumptions.

So for optimality the Wilcoxon test assumes that the two distributions, after logit transformation, have the same shape and steepness. But the Wilcoxon test can be successfully used even when this assumption doesn't hold.

Much confusion exists about what the test assumes when you are calculating $P$-values. If using the normal distribution shortcut for $P$-values, more assumptions are made including equal dispersion in the two groups. If using general $U$-statistic theory (e.g. R Hmisc package rcorr.cens function) to get the standard error of the concordance probability), or if using the likelihood ratio or score $\chi^2$ tests from the proportional odds model, this assumption is not needed.