The Wilcoxon signed rank test is a nonparametric test for two populations when the observations are paired. Using the Wilcoxon signed rank test with two samples, s1
and s2
will allow you to test for the null hypothesis that s1 – s2
comes from a distribution with zero median and density that is symmetric about that median (thanks @ttnphns for spotting this
). It is not concerned with averages (ie. means) at any point.
My main concern would be that that the Wilcoxon signed rank test asks for each pair to be chosen randomly and independently. Your data appears to be part of a timeseries so I would suspect a seasonal component to come into play.
I do not think that the Wilcoxon signed rank test assumptions are fulfilled for your particular case. You might want to "bend the rules" and say that each pair is random and independent of the others (so you are OK to use the W.s.r. test) but this is your choice to make.
Brief sketch of ARE for one-sample $t$-test, signed test and the signed-rank test
I expect the long version of @Glen_b's answer includes detailed analysis for two-sample signed rank test along with the intuitive explanation of the ARE. So I'll skip most of the derivation. (one-sample case, you can find the missing details in Lehmann TSH).
Testing Problem: Let $X_1,\ldots,X_n$ be a random sample from location model $f(x-\theta)$, symmetric about zero. We are to compute ARE of signed test, signed rank test for the hypothesis $H_0: \theta=0$ relative to t-test.
To assess the relative efficiency of tests, only local alternatives are considered because consistent tests have power tending to 1 against fixed alternative.
Local alternatives that give rise to nontrivial asymptotic power is often of the form $\theta_n=h/\sqrt{n}$ for fixed $h$, which is called Pitman drift in some literature.
Our task ahead is
- find the limit distribution of each test statistic under the null
- find the limit distribution of each test statistic under the alternative
- compute the local asymptotic power of each test
Test statisics and asymptotics
- t-test (given the existence of $\sigma$) $$t_n=\sqrt{n}\frac{\bar{X}}{\hat{\sigma}}\to_dN(0,1)\quad \text{under the null}$$
$$t_n=\sqrt{n}\frac{\bar{X}}{\hat{\sigma}}\to_dN(h/\sigma,1)\quad \text{under the alternative }\theta=h/\sqrt{n}$$
- so the test that rejects if $t_n>z_\alpha$ has asymptotic power function
$$1-\Phi\left(z_\alpha-h\frac{1}{\sigma}\right)$$
- signed test $S_n=\frac{1}{n}\sum_{i=1}^{n}1\{X_i>0\}$
$$\sqrt{n}\left(S_n-\frac{1}{2}\right)\to_dN\left(0,\frac{1}{4}\right)\quad \text{under the null }$$
$$\sqrt{n}\left(S_n-\frac{1}{2}\right)\to_dN\left(hf(0),\frac{1}{4}\right)\quad \text{under the alternative }$$ and has local asymptotic power
$$1-\Phi\left(z_\alpha-2hf(0)\right)$$
- signed-rank test $$W_n=n^{-2/3}\sum_{i=1}^{n}R_i1\{X_i>0\}\to_dN\left(0,\frac{1}{3}\right)\quad \text{under the null }$$
$$W_n\to_dN\left(2h\int f^2,\frac{1}{3}\right)\quad \text{under the alternative }$$
and has local asymptotic power
$$1-\Phi\left(z_\alpha-\sqrt{12}h\int f^2\right)$$
Therefore, $$ARE(S_n)=(2f(0)\sigma)^2$$
$$ARE(W_n)=(\sqrt{12}\int f^2\sigma)^2$$
If $f$ is standard normal density, $ARE(S_n)=2/\pi$, $ARE(W_n)=3/\pi$
If $f$ is uniform on [-1,1], $ARE(S_n)=1/3$, $ARE(W_n)=1/3$
Remark on the derivation of distribution under the alternative
There are of course many ways to derive the limiting distribution under the alternative. One general approach is to use Le Cam's third lemma. Simplified version of it states
Let $\Delta_n$ be the log of the likelihood ratio. For some statistic
$W_n$, if
$$ (W_n,\Delta_n)\to_d N\left[\left(\begin{array}{c}
\mu\\
-\sigma^2/2
\end{array}\right),\left(\begin{array}{cc}
\sigma^2_W & \tau \\
\tau & \sigma^2/2
\end{array}\right)\right]\\
$$
under the null, then $$W_n\to_d N\left(\mu+\tau,\sigma^2_W\right)\quad\text{under the alternative}$$
For quadratic mean differentiable densities, local asymptotic normality and contiguity are automatically satisfied, which in turn implies Le Cam lemma.
Using this lemma, we only need to compute $\mathrm{cov}(W_n,\Delta_n)$ under the null. $\Delta_n$ obeys LAN $$\Delta_n\approx \frac{h}{\sqrt{n}}\sum_{i=1}^{n}l(X_i)-\frac{1}{2}h^2I_0$$ where $l$ is score function, $I_0$ is information matrix.
Then, for instance, for signed test $S_n$
$$\mathrm{cov}(\sqrt{n}(S_n-1/2),\Delta_n)=-h\mathrm{cov}\left(1\{X_i>0\},\frac{f'}{f}(X_i)\right)=h\int_0^\infty f'=hf(0)$$
Best Answer
Klotz looked at small sample power of the signed rank test compared to the one sample $t$ in the normal case.
[Klotz, J. (1963) "Small Sample Power and Efficiency for the One Sample Wilcoxon and Normal Scores Tests" The Annals of Mathematical Statistics, Vol. 34, No. 2, pp. 624-632]
At $n=10$ and $\alpha$ near $0.1$ (exact $\alpha$s aren't achievable of course, unless you go the randomization route, which most people avoid in use, and I think with reason) the relative efficiency to the $t$ at the normal tends to be quite close to the ARE there (0.955), though how close depends (it varies with the mean shift and at smaller $\alpha$, the efficiency will be lower). At smaller sample sizes than 10 the efficiency is generally (a little) higher.
At $n=5$ and $n=6$ (both with $\alpha$ close to 0.05), the efficiency was around 0.97 or higher.
So, broadly speaking ... the ARE at the normal is an underestimate of the relative efficiency in the small sample case, as long as $\alpha$ isn't small. I believe that for a two-tailed test with $n=4$ your smallest achievable $\alpha$ is 0.125. At that exact significance level and sample size, I think the relative efficiency to the $t$ will be similarly high (perhaps still around the 0.97-0.98 or higher) in the area where the power is interesting.
I should probably come back and talk about how to do a simulation, which is relatively straightforward.
Edit:
I've just done a simulation at the 0.125 level (because it's achievable at this sample size); it looks like - across a range of differences in mean, the typical efficiency is a bit lower, for $n=4$, more around 0.95-0.97 or so - similar to the asymptotic value.
Update
Here's a plot of the power (2 sided) for the t-test (computed by
power.t.test
) in normal samples, and simulated power for the Wilcoxon signed rank test - 40000 simulations per point, with the t-test as a control variate. The uncertainty in the position of the dots is less than a pixel:To make this answer more complete I should actually look at the behavior for the case for which the ARE actually is 0.864 (the beta(2,2)).