Solved – How to deal with ties when conducting Wilcoxon signed rank test

paired-datarwilcoxon-signed-rank

I have a paired data with a small sample size of 21 and I'd like to conduct Wilcoxon signed rank test using R. wilcox.test from the base stat package gives results with a warning message that says that exact p-value cannot be computed with ties and zeroes. Since my data is small, I cannot afford to discard the ties. I'm also aware of coin package that has wilcoxsign_test function. I'm not sure if it gives the exact p-value.

This is how the data looks like:
Q1.response Q2.response avg_diff 1 2.5 2.0 -0.5 2 3.5 1.5 -2.0 3 2.0 2.0 0.0 4 1.5 4.0 2.5 5 4.0 3.5 -0.5 6 3.5 4.0 0.5 7 3.0 3.0 0.0 8 2.5 2.0 -0.5 9 4.0 3.5 -0.5 10 3.5 2.5 -1.0 11 3.5 3.5 0.0 12 2.5 1.5 -1.0 13 2.0 2.0 0.0 14 3.0 3.0 0.0 15 1.5 2.5 1.0 16 1.5 1.5 0.0 17 1.5 1.5 0.0 18 2.0 2.5 0.5 19 3.5 2.5 -1.0 20 1.5 1.5 0.0 21 3.0 2.0 -1.0

As you can see, there are several ties. Any suggestions on how to proceed would be greatly appreciated.

Thanks!

Best Answer

There are two obstacles to doing a Wilcoxon signed-rank test: (a) You have only 13 non-zero differences among 21. The 0 differences provide no evidence that Q1 and Q2 differ. You may hate to 'discard' these differences, but they were never really there. (b) There are many ties among the non-zero differences; only six unique differences among 13. Because the Wilcoxon test is based on ranks, the existence of so many ties makes it difficult to find a P-value. I simply don't think a Wilcoxon signed rank test is useful for analyzing your data.

In principle, a simulated permutation test can obtain a reasonable approximation of the P-value. However, that does not help find a significant difference for your data. P-values of permutation tests (one or two-sided) on your data are much too big to lead to rejection.

The root difficulty is that 4 of the 13 nonzero differences are positive and 9 are negative, and that is too well balanced an outcome to lead to rejection. (If a fair coin is flipped 13 times, there is better than 1 chance in 4 that it will show Heads either 4 or fewer times or 9 or more times.)

The bottom line is that your data are consistent with the null hypothesis that Q1 and Q2 do not differ.

Addendum: Here is one possible permutation test for your data.

First, a paired t test gives test statistic $T = -0.8495$ with P-value $0.4057.$

t.test(d)

        One Sample t-test

data:  d
t = -0.8495, df = 20, p-value = 0.4057
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.5759201  0.2425867
sample estimates:
 mean of x 
-0.1666667

The legendary robustness of the t test notwithstanding, a t test may not be appropriate for your data because they are so discrete and because they fail a Shapiro-Wilk test of normality with P-value $0.02569.$

shapiro.test(d)

        Shapiro-Wilk normality test

data:  d
W = 0.89302, p-value = 0.02569

The usefulness of the t statistic as a measure of any shift from Q1 to Q2 is not in question. However, because the data are not normal, it is doubtful whether the distribution of the t statistic is Student's t distribution with 20 degrees of freedom.

A permutation test is based on the idea that if there is no shift in values from Q1 to Q2, we could change the signs of the differences without harm. If we change these signs at random many times, and compute the t statistic for each such 'permuted' sample, then we can approximate the 'permutation distribution' of the t statistic, and use that distribution to get a reliable P-value.

There are too many possible permutations to consider them all by combinatorial methods, but simulating many cases gives a serviceable result. The result will be slightly different on each run, but with 100,000 iterations not enough different to affect the conclusion whether to reject the null hypothesis. (Results from a program in R are shown below.)

The run below gave P-value 0.49, which is very much larger than 5%, so we cannot reject the null hypothesis. [The P-value is not enormously different from the P-value of the t test, but enough different that doing the permutation test was worthwhile.]

d = c(-.5, -2, 0, 2.5, -.5,  .5, 0, -.5, -.5, -1,   0, -1, 0 ,0, 1,  0, 0, .5, -1, 0, -1)
n = length(d)
set.seed(706);  m = 10^5
t.obs = t.test(d)$stat;  pv.obs = t.test(d)$p.val
t.prm = replicate(m, t.test(d*sample(c(-1,1), n, rep=T))$stat)
p.val = mean(abs(t.prm) >= abs(t.obs));  p.val
[1] 0.48816

The permutation distribution of the t statistic is highly discrete, but when plotted with only a few histogram bars it is seen not to differ greatly from Student's t distribution with 20 degrees of freedom, the density of which is shown in the figure below.

Reference: Eudey, et al. (2010) provides an elementary introduction to permutation tests; Sect. 3 deals with paired designs.

Related Solutions

Solved – In R, why are results different for 1-tailed wilcoxsign_test, wilcox_test, and wilcox.exact with paired data

Your call for this test:

wilcox_test(y ~ condition | participant, data = data, alternative = "greater", 
   distribution = "approximate")

produces not a signed rank test (as the other calls do) but a rank sum test. It's a different thing for a different situation (independent samples, not paired). It tells you this in the output:

## Approximative Wilcoxon Mann-Whitney Rank Sum Test

So there's no way that should give a similar p-value to the other cases.

from your comment:

which indicates that it should handle handle pairwise comparisons as well.

It's possible that's a correct conclusion, but - whatever is needed to make it do that - you clearly didn't achieve it, as the output made clear: you didn't get a signed rank test.

The two exact signed rank tests did produce the same p-value, as one would hope.

With extremely small p-values you should not expect approximate methods to be highly accurate (close to the exact test values). They do all lead one to reject at any sensible significance level, which is about as much as you can ask as far as consistency goes.

As for differences between them, you'd have to look to exactly what has been implemented for each test - what approximations are made, what assumptions, and so on.

The last p-value (the pairwise comparison) doesn't seem to be a one-tailed test so it's hardly surprising it's about twice as large as the two above it.

Solved – Paired T test vs Wilcoxon signed-rank

If your data is normally distributed -- you can analyze a number of ways, including a QQ Plot -- then it is fine to run a t-test. But, in order to make the least number of assumptions about the data it is best to use the non-parametric Wilcoxon Signed Rank test.

Due to the fact that you have very few samples (24) I would advise going the Wilcoxon Signed Rank path. I would thoroughly analyze this question because it appears to answer a lot on necessary questions.

Be sure to understand exactly how the type I error and the power behaves in your test.

Best Answer

Related Solutions

Solved – In R, why are results different for 1-tailed wilcoxsign_test, wilcox_test, and wilcox.exact with paired data

Solved – Paired T test vs Wilcoxon signed-rank

Related Question