Solved – wilcox.test R question

rstatistical significancewilcoxon-mann-whitney-test

I have a question about using wilcox.test in R.

I have the following code/data:

a <- c(1,1,1,1,1,2,2,2,2)
b <- c(1,1,1,1,1,1,1,1)
results <- wilcox.test(a,b, conf.int = T, exact = F)
print(results)

The code above returns the following:

>   Wilcoxon rank sum test with continuity correction
> 
> data:  a and b W = 52, p-value = 0.04271 alternative hypothesis: true
> location shift is not equal to 0 95 percent confidence interval:  0 1
> sample estimates: difference in location 
>           9.415723e-05

As you can see the p-value is less than 0.05 but is not exact. In fact, when I make exact parameter TRUE then I get the warning message that exact p-value cannot be computed:

1: In wilcox.test.default(a, b, conf.int = T, exact = T) :
  cannot compute exact p-value with ties

Here are my questions:

  1. Even though the p-value is less than 0.05, the confidence interval includes 0. How am I supposed to interpret such data? Is the statistical test significant?

  2. After performing two-tailed Wilcoxon test in R, what is the correct way to interpret which group's distribution is significantly higher/lower? Do you look at the confidence interval or the difference in location?

  3. How do you interpret the test result when the confidence interval cannot be computed because all observations are tied?

Thank you.

Best Answer

These data seem wholly unsuited for analysis with a Wilcoxon rank sum test, which assumes continuous data, at least to the extent of avoiding ties. (This Wilcoxon test uses distributions of ranks, which become difficult to compute--even when there are only a few ties. When there are many ties, as in your data, results of the test are essentially meaningless.)

Here is a permutation test of the null hypothesis that the A and B populations are the same against the alternative that the A population has a larger mean than the B population: There are four 2's among 17 observations. All four are found in the first sample. There is no more extreme result. So the probability of this result by chance alone is the one-sided P-value of the test: $$\frac{{9 \choose 4}{8 \choose 0}}{{17 \choose 4}} = 0.0529.$$ Because the P-value exceeds $0.05$ you cannot reject the null hypothesis at the 5% level of significance. [If you want a 2-sided test that the populations differ, as suggested in @whuber's Comment, then the P-value must also include the probability that all four 2's go into Group B: ${9 \choose 4}{8 \choose 0}/{17 \choose 4} + {9 \choose 0}{8 \choose 4}/{17 \choose 4} = 0.0824.]$

With a one-sided P-value so near to 5% you might say there is 'weakly suggestive' evidence that population A has larger values. If this project warrants additional effort, you might try getting more data, as @Rolland has commented.

Note: Often there are so many possible outcomes in a permutation test that there is no simple combinatorial method to find the P-value. Then, one can get a simulated P-value, as in the R code below, where the 'metric' is the difference in means between the two groups. The result with $m = 100,000$ iterations is that the one-sided P-value is just above $0.05$ and the two-sided P-value is just above $0.08$ (both results are in agreement with the combinatorial computations above).

x = c(1,1, 1,1, 1,2, 2,2,2);  y = c(1,1, 1,1, 1,1, 1,1);  all = c(x, y)
set.seed(727);  m = 10^6;  prm.d = numeric(m);  obs.d = mean(x) - mean(y)
for(i in 1:m) {
  prm = sample(all);  prm.d[i] = mean(prm[1:9]) - mean(prm[10:17])  }
mean(prm.d >= obs.d)
[1] 0.053383                    # aprx P-value of one-sided test
mean(abs(prm.d) >= abs(obs.d))
[1] 0.082756                    # aprx P-value of two-sided test

Ref: For an elementary presentation of permutations tests, see Eudey et al. (2010); two-sample tests are discussed in Sect. 3.