Solved – Mann-Whitney/Wilcoxon rank sum test in R with a table of tallies

rwilcoxon-mann-whitney-test

I have a 10 treatments, 5 samples per treatment, and the result of each treatment is ranked on a ordinal scale: Poor, Fair, Moderate, Good, Excellent. I have the data tallied up so I have the total number of observations for each category on the scale; it looks like this:

           TreatA   TreatB  TreatC    
Poor        2       2       1         
Fair        1       1       1       
Moder       1       2       0       
Good        0       0       2
Exc         1       0       1

Is there a way to do a Mann-Whitney/Wilcoxon rank sum test on data in this format? I have been using the wilcox.test function, but it requires each observation to be listed individually, rather than totalled; that data would need to look more like

TreatA   TreatB
Poor     Poor
Poor     Poor
Fair     Fair
Exc      Fair

The problem is that I actually have 25 treatments and the number of samples per treatment can be more than 60. I would prefer to use the data in the format I have it in currently.

Is there a function in R that will let me perform this test on my data as it is formatted?

Best Answer

Is there a way to do a Mann-Whitney/Wilcoxon rank sum test on data in this format?

Yes. Calculating the test statistic is simple. Your biggest problem will be the heavy ties.

I have been using the wilcox.test function, but it requires each observation to be listed individually, rather than totalled;

There are several ways to get the data into a form that wilcox.test will take. Here's one:

exper <- read.table(stdin(),header=TRUE)
          TreatA   TreatB  TreatC    
Poor        2       2       1         
Fair        1       1       1       
Moder       1       2       0       
Good        0       0       2
Exc         1       0       1

# at this point exper looks like your data
# make it an actual table:
exper <- as.matrix(exper)
class(exper) <- "table"
# convert it:
expertmp <- as.data.frame(exper)

# convert the counts to repetitions:
errle <- list(lengths=expertmp$Freq,values=1:length(expertmp$Freq))
class(errle) <- "rle"
experaw <- expertmp[inverse.rle(errle),1:2]

#need to make it an ordered factor.
experaw$Var1 <- ordered(experaw$Var1)

At this point, the data is in a form you can use. You can use rep rather than inverse.rle (and it's probably a bit shorter that way) to expand counts to repetitions.

e.g., you can do this:

kruskal.test(as.numeric(Var1)~Var2,experaw)

or this:

wilcox.test(as.numeric(Var1)~Var2,experaw[experaw$Var2!="TreatC",])

but (as R warns you in the second case) the p-value with ties is not exact. And unfortunately ties are heavy. You could do a permutation test (/randomization test) and get a "simulated" p-value, which will effectively be exact at a sufficiently large number of resamples.

However, since your observations-per-sample are pretty large, you should be able to get away with a normal approximation with the variance adjusted for ties. There's sufficient information in say Conover's Practical Nonparametric Statistics to do this for oneself.