Solved – wilcox.test R question

rstatistical significancewilcoxon-mann-whitney-test

I have a question about using wilcox.test in R.

I have the following code/data:

a <- c(1,1,1,1,1,2,2,2,2)
b <- c(1,1,1,1,1,1,1,1)
results <- wilcox.test(a,b, conf.int = T, exact = F)
print(results)

The code above returns the following:

>   Wilcoxon rank sum test with continuity correction
> 
> data:  a and b W = 52, p-value = 0.04271 alternative hypothesis: true
> location shift is not equal to 0 95 percent confidence interval:  0 1
> sample estimates: difference in location 
>           9.415723e-05

As you can see the p-value is less than 0.05 but is not exact. In fact, when I make exact parameter TRUE then I get the warning message that exact p-value cannot be computed:

1: In wilcox.test.default(a, b, conf.int = T, exact = T) :
  cannot compute exact p-value with ties

Here are my questions:

Even though the p-value is less than 0.05, the confidence interval includes 0. How am I supposed to interpret such data? Is the statistical test significant?
After performing two-tailed Wilcoxon test in R, what is the correct way to interpret which group's distribution is significantly higher/lower? Do you look at the confidence interval or the difference in location?
How do you interpret the test result when the confidence interval cannot be computed because all observations are tied?

Thank you.

Best Answer

These data seem wholly unsuited for analysis with a Wilcoxon rank sum test, which assumes continuous data, at least to the extent of avoiding ties. (This Wilcoxon test uses distributions of ranks, which become difficult to compute--even when there are only a few ties. When there are many ties, as in your data, results of the test are essentially meaningless.)

Here is a permutation test of the null hypothesis that the A and B populations are the same against the alternative that the A population has a larger mean than the B population: There are four 2's among 17 observations. All four are found in the first sample. There is no more extreme result. So the probability of this result by chance alone is the one-sided P-value of the test: $$\frac{{9 \choose 4}{8 \choose 0}}{{17 \choose 4}} = 0.0529.$$ Because the P-value exceeds $0.05$ you cannot reject the null hypothesis at the 5% level of significance. [If you want a 2-sided test that the populations differ, as suggested in @whuber's Comment, then the P-value must also include the probability that all four 2's go into Group B: ${9 \choose 4}{8 \choose 0}/{17 \choose 4} + {9 \choose 0}{8 \choose 4}/{17 \choose 4} = 0.0824.]$

With a one-sided P-value so near to 5% you might say there is 'weakly suggestive' evidence that population A has larger values. If this project warrants additional effort, you might try getting more data, as @Rolland has commented.

Note: Often there are so many possible outcomes in a permutation test that there is no simple combinatorial method to find the P-value. Then, one can get a simulated P-value, as in the R code below, where the 'metric' is the difference in means between the two groups. The result with $m = 100,000$ iterations is that the one-sided P-value is just above $0.05$ and the two-sided P-value is just above $0.08$ (both results are in agreement with the combinatorial computations above).

x = c(1,1, 1,1, 1,2, 2,2,2);  y = c(1,1, 1,1, 1,1, 1,1);  all = c(x, y)
set.seed(727);  m = 10^6;  prm.d = numeric(m);  obs.d = mean(x) - mean(y)
for(i in 1:m) {
  prm = sample(all);  prm.d[i] = mean(prm[1:9]) - mean(prm[10:17])  }
mean(prm.d >= obs.d)
[1] 0.053383                    # aprx P-value of one-sided test
mean(abs(prm.d) >= abs(obs.d))
[1] 0.082756                    # aprx P-value of two-sided test

Ref: For an elementary presentation of permutations tests, see Eudey et al. (2010); two-sample tests are discussed in Sect. 3.

Related Solutions

Solved – How to analyze these data

Sometimes a formal statistical test is overkill. Row by row, the entries in the first column are the largest. Draw a picture to make this apparent: side-by-side boxplots or dotplots would work nicely.

Although this is a post-hoc comparison, if the initial intent had been to compare the first column against the rest for a shift in distribution, the most extreme characterizations would be that either all maxima or all minima occur in the first column (a two-sided test). The chance of this occurring by chance, if all columns contained values drawn at random from a common distribution, would be $2 (\frac{1}{6})^7$ = about 0.0007%.

In fact, the first two contains the largest 7 of the 42 values. Again, ex post facto, the chance of such an extreme ordering occurring equals $\frac{2}{42 \choose 7}$ = about 0.000007%.

These results indicate that any reasonably powerful test you choose to conduct will conclude there's a highly significant difference.

In any event, You don't need a p-value; you need to characterize how large the difference is (the right way to do this depends on what the data mean) and you need to seek an explanation for the difference.

Solved – How to deal with ties when conducting Wilcoxon signed rank test

There are two obstacles to doing a Wilcoxon signed-rank test: (a) You have only 13 non-zero differences among 21. The 0 differences provide no evidence that Q1 and Q2 differ. You may hate to 'discard' these differences, but they were never really there. (b) There are many ties among the non-zero differences; only six unique differences among 13. Because the Wilcoxon test is based on ranks, the existence of so many ties makes it difficult to find a P-value. I simply don't think a Wilcoxon signed rank test is useful for analyzing your data.

In principle, a simulated permutation test can obtain a reasonable approximation of the P-value. However, that does not help find a significant difference for your data. P-values of permutation tests (one or two-sided) on your data are much too big to lead to rejection.

The root difficulty is that 4 of the 13 nonzero differences are positive and 9 are negative, and that is too well balanced an outcome to lead to rejection. (If a fair coin is flipped 13 times, there is better than 1 chance in 4 that it will show Heads either 4 or fewer times or 9 or more times.)

The bottom line is that your data are consistent with the null hypothesis that Q1 and Q2 do not differ.

Addendum: Here is one possible permutation test for your data.

First, a paired t test gives test statistic $T = -0.8495$ with P-value $0.4057.$

t.test(d)

        One Sample t-test

data:  d
t = -0.8495, df = 20, p-value = 0.4057
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.5759201  0.2425867
sample estimates:
 mean of x 
-0.1666667

The legendary robustness of the t test notwithstanding, a t test may not be appropriate for your data because they are so discrete and because they fail a Shapiro-Wilk test of normality with P-value $0.02569.$

shapiro.test(d)

        Shapiro-Wilk normality test

data:  d
W = 0.89302, p-value = 0.02569

The usefulness of the t statistic as a measure of any shift from Q1 to Q2 is not in question. However, because the data are not normal, it is doubtful whether the distribution of the t statistic is Student's t distribution with 20 degrees of freedom.

A permutation test is based on the idea that if there is no shift in values from Q1 to Q2, we could change the signs of the differences without harm. If we change these signs at random many times, and compute the t statistic for each such 'permuted' sample, then we can approximate the 'permutation distribution' of the t statistic, and use that distribution to get a reliable P-value.

There are too many possible permutations to consider them all by combinatorial methods, but simulating many cases gives a serviceable result. The result will be slightly different on each run, but with 100,000 iterations not enough different to affect the conclusion whether to reject the null hypothesis. (Results from a program in R are shown below.)

The run below gave P-value 0.49, which is very much larger than 5%, so we cannot reject the null hypothesis. [The P-value is not enormously different from the P-value of the t test, but enough different that doing the permutation test was worthwhile.]

d = c(-.5, -2, 0, 2.5, -.5,  .5, 0, -.5, -.5, -1,   0, -1, 0 ,0, 1,  0, 0, .5, -1, 0, -1)
n = length(d)
set.seed(706);  m = 10^5
t.obs = t.test(d)$stat;  pv.obs = t.test(d)$p.val
t.prm = replicate(m, t.test(d*sample(c(-1,1), n, rep=T))$stat)
p.val = mean(abs(t.prm) >= abs(t.obs));  p.val
[1] 0.48816

The permutation distribution of the t statistic is highly discrete, but when plotted with only a few histogram bars it is seen not to differ greatly from Student's t distribution with 20 degrees of freedom, the density of which is shown in the figure below.

Reference: Eudey, et al. (2010) provides an elementary introduction to permutation tests; Sect. 3 deals with paired designs.

Best Answer

Related Solutions

Solved – How to analyze these data

Solved – How to deal with ties when conducting Wilcoxon signed rank test

Related Question