Wilcoxon Signed Rank – Relative Efficiency in Small Samples

efficiencynonparametricpaired-comparisonsstatistical-powerwilcoxon-signed-rank

I have seen in published literature (and posted on here) that the asymptotic relative efficiency of the Wilcoxon signed rank test is at least 0.864 when compared to the t test. I have also heard that this only applies to large samples, although some books don't mentioning this (what's with that?).

Anyway, my question is, how small do things need to get before the above paragraph no longer applies?

In my case I have 4 pairs of data. If all assumptions hold, I know I have at least 90% power to detect an effect size of 2SD under the paired t test if I use an alpha of 0.1 and have moderately correlated data. However, I would like to use the Wilcoxon signed rank test due to the small sample size and inability to check assumptions but I'm concerned the test will have too little power if I do. Thanks!

Best Answer

Klotz looked at small sample power of the signed rank test compared to the one sample $t$ in the normal case.

[Klotz, J. (1963) "Small Sample Power and Efficiency for the One Sample Wilcoxon and Normal Scores Tests" The Annals of Mathematical Statistics, Vol. 34, No. 2, pp. 624-632]

At $n=10$ and $\alpha$ near $0.1$ (exact $\alpha$s aren't achievable of course, unless you go the randomization route, which most people avoid in use, and I think with reason) the relative efficiency to the $t$ at the normal tends to be quite close to the ARE there (0.955), though how close depends (it varies with the mean shift and at smaller $\alpha$, the efficiency will be lower). At smaller sample sizes than 10 the efficiency is generally (a little) higher.

At $n=5$ and $n=6$ (both with $\alpha$ close to 0.05), the efficiency was around 0.97 or higher.

So, broadly speaking ... the ARE at the normal is an underestimate of the relative efficiency in the small sample case, as long as $\alpha$ isn't small. I believe that for a two-tailed test with $n=4$ your smallest achievable $\alpha$ is 0.125. At that exact significance level and sample size, I think the relative efficiency to the $t$ will be similarly high (perhaps still around the 0.97-0.98 or higher) in the area where the power is interesting.

I should probably come back and talk about how to do a simulation, which is relatively straightforward.

Edit:

I've just done a simulation at the 0.125 level (because it's achievable at this sample size); it looks like - across a range of differences in mean, the typical efficiency is a bit lower, for $n=4$, more around 0.95-0.97 or so - similar to the asymptotic value.


Update

Here's a plot of the power (2 sided) for the t-test (computed by power.t.test) in normal samples, and simulated power for the Wilcoxon signed rank test - 40000 simulations per point, with the t-test as a control variate. The uncertainty in the position of the dots is less than a pixel:

power curve for t and power for Wilcoxon


To make this answer more complete I should actually look at the behavior for the case for which the ARE actually is 0.864 (the beta(2,2)).