Wilcoxon Rank Sum Test vs Wilcoxon Signed Rank Test – Differences Explained

paired-datawilcoxon-mann-whitney-testwilcoxon-signed-rank

I was wondering what the theoretical difference is between the Wilcoxon Rank-Sum Test and the Wilcoxon Signed-Rank Test using paired observations. I know that the Wilcoxon Rank-Sum Test allows for a different amount of observations in two different samples, whereas the Signed-Rank test for paired samples does not allow that, however, they both seem to test the same in my opinion.

Can someone give me some more background / theoretical information on when one should use the Wilcoxon Rank-Sum Test and when one should use the Wilcoxon Signed-Rank Test using paired observations?

Best Answer

You should use the signed rank test when the data are paired.

You'll find many definitions of pairing, but at heart the criterion is something that makes pairs of values at least somewhat positively dependent, while unpaired values are not dependent. Often the dependence-pairing occurs because they're observations on the same unit (repeated measures), but it doesn't have to be on the same unit, just in some way tending to be associated (while measuring the same kind of thing), to be considered as 'paired'.

You should use the rank-sum test when the data are not paired.

That's basically all there is to it.

Note that having the same $n$ doesn't mean the data are paired, and having different $n$ doesn't mean that there isn't pairing (it may be that a few pairs lost an observation for some reason). Pairing comes from consideration of what was sampled.

The effect of using a paired test when the data are paired is that it generally gives more power to detect the changes you're interested in. If the association leads to strong dependence*, then the gain in power may be substantial.

* specifically, but speaking somewhat loosely, if the effect size is large compared to the typical size of the pair-differences, but small compared to the typical size of the unpaired-differences, you may pick up the difference with a paired test at a quite small sample size but with an unpaired test only at a much larger sample size.

However, when the data are not paired, it may be (at least slightly) counterproductive to treat the data as paired. That said, the cost - in lost power - may in many circumstances be quite small - a power study I did in response to this question seems to suggest that on average the power loss in typical small-sample situations (say for n of the order of 10 to 30 in each sample, after adjusting for differences in significance level) may be surprisingly small.

[If you're somehow really uncertain whether the data are paired or not, the loss in treating unpaired data as paired is usually relatively minor, while the gains may be substantial if they are paired. This suggests if you really don't know, and have a way of figuring out what is paired with what assuming they were paired -- such as the values being in the same row in a table, it may in practice may make sense to act as if the data were paired to be safe -- though some people may tend to get quite exercised over you doing that.]