Solved – Clarification on Mann-Whitney-Wilcoxon Test

distributionsrwilcoxon-mann-whitney-test

I was looking to get some advice on some analysis using R, I want to ensure I am doing everything correctly.

I want to compare test scores between two groups (A and B), I have two tests but wish to look at each individually, i.e compare scores between group A and B for Test 1, then do the same for Test 2.

Here is my example data:

exampledata1 <- read.table(text="Unique ID,Test.1,Test.2,Group
544001,3,1,A
668149,0,0,A
314205,0,1,A
646843,1,1,A
599014,0,1,A
502115,1,0,A
123725,2,2,A
170541,3,3,A
181769,1,1,A
928772,0,0,A
448116,0,1,A
287093,0,0,A
754849,1,1,A
862914,0,0,A
390557,0,0,A
749690,0,0,A
423885,1,0,A
831307,0,0,A
642633,1,,A
960592,0,0,A
162882,1,1,A
762726,0,0,A
463306,0,1,A
236706,1,1,A
979328,0,1,A
590783,3,3,A
821588,1,0,A
112601,1,1,A
921085,2,1,A
336733,0,0,A
315681,0,0,A
237933,1,1,A
698807,0,0,A
720256,1,1,A
525437,0,1,A
735806,0,0,A
260575,1,2,A
763688,1,1,A
114882,0,1,A
525912,3,1,A
972047,0,1,A
333651,2,1,A
620004,3,3,B
910745,0,0,B
837662,1,1,B
943322,1,0,B
907396,2,2,B
522464,3,3,B
637028,0,0,B
929821,2,2,B
831377,0,0,B
273030,,,B
978827,1,0,B
725910,0,0,B
519596,3,3,B
484731,1,2,B
344732,3,3,B
604679,1,2,B
213215,0,0,B
595850,0,0,B
286045,3,1,B
192411,2,2,B
747516,1,0,B
729803,1,1,B
266336,1,1,B
527596,,,B
515154,0,1,B
356337,1,1,B
245176,1,1,B
599492,1,1,B
713802,1,1,B
285520,0,0,B
254784,2,1,B
396954,1,1,B
918426,1,1,B
895730,1,1,B
436572,1,1,B
106052,1,2,B
880444,1,1,B
834328,1,1,B
180569,1,1,B
383651,2,1,B
547905,1,1,B
222952,1,1,B", header = TRUE, sep = ",")

You may have noticed my example data includes some missing values, this is true of my real data so I have included it as such in case this makes a difference.

I have tested the data for normality and found that the data is not normally distributed.

>shapiro.test(exampledata1$Test.1)

Shapiro-Wilk normality test

data:  exampledata1$Test.1
W = 0.8043, p-value = 4.618e-09

As there are two groups, the data is not normally distributed and there are no repeated measures I have opted to use the Mann-Whitney-Wilcoxon Test:

wilcox.test(Test.1 ~ Group, data=exampledata1)

Wilcoxon rank sum test with continuity correction

data:  Test.1 by Group
W = 597.5, p-value = 0.01603
alternative hypothesis: true location shift is not equal to 0

Warning message:
In wilcox.test.default(x = c(3L, 0L, 0L, 1L, 0L, 1L, 2L, 3L, 1L,  :
cannot compute exact p-value with ties

The reason I am not sure if this is correct is because I thought this test used ranking, and as stated "cannot compute exact p-value with ties". As my scores only consist of 0,1,2,3 I wondered whether this would effect my result as all of the scores are repeated a number of times?

Secondly, once I have done this I want to look at the difference between scores in Test 1 and Test 2 for group A and B, to do this would I just work out the difference between the scores for each participant then perform the same test?

EDIT: Thank you both for your help, I have opted to use the categorical analysis as this is more in line with my data.

I wondered if you may be able to advise on one other thing, having used Fishers Exact Test on the 2×5 table for each test I found a pvalue of 0.03 for Test 1. However I am unsure how I determine how the groups are significantly different? Any help would be appreciated.

Best Answer

The Fisher test is one way to test whether there are differences in the distribution of cases among cells of a contingency table. If these sample data are much smaller in numbers of cases than your "real" data then a chi-square test might be better. Neither test specifies exactly how the distribution differs. Typically a display of the contingency table is fairly self explanatory, although if you are writing this up for others then you might want to add some discussion based on your knowledge of the subject matter.

It seems that you have added a category of "missing" to your 4 numeric categories. That might not be wise. Handling of missing data is tricky, particularly if the probability of a value's being missing is somehow related to its actual value. The missing-data tag on this site has many useful posts. You might also follow the multiple-imputation tag to see if that is appropriate or helpful for your application. In your case, without imputation, I might first determine whether groups A and B differed in the frequency of missing data, then proceed to analyze the non-missing values.

If you are interested in some measure of the average difference in scores between groups A and B, and particularly if you are interested in whether there are differences between Test 1 and Test 2, or interactions between the Groups and the Tests, then you should instead follow @whuber's recommendation in a comment and just treat the values as numeric in a linear model or ANOVA. As he notes, even though your data can't be normally distributed, in practice with large numbers of cases and these types of data the normal-theory methods will generally work well enough. After thinking about this, I prefer his suggestion over mine, as you have an ordered numeric score rather than simply a set of arbitrary non-ordered categories.

Note that the default for numeric tests in R (whether lm or wilcox.test) is to ignore missing data. So you should try to understand the nature of the data's missingness, whether you proceed with contingency tables or with numeric data analysis.

Finally, note that the "Warning" from wilcox.test was just that: a warning rather than an indication that your results are incorrect. The test was OK, reporting a p-value based on a normal approximation rather than an exact p-value based on the data values. If you had 50 or more cases, your call to wilcox.test would not even have tried to calculate exact p-values. The coin package in R has a wilcox_test function that can calculate exact p-values in the presence of ties, but I see no need here for an exact p-value.

Related Question