Wilcoxon Signed-Rank – Understanding the Variance of Wilcoxon Signed-Rank Statistic

mathematical-statisticsself-studyvariancewilcoxon-signed-rank

Problem Statement: Let $T$ denote the Wilcoxon signed-rank test for $n$ pairs of observations. Show that
$E(T)=(1/4)n(n+1)$ and $V(T)=(1/24)[n(n+1)(2n+1)]$ when the two populations are identical.

Note: this is Exercise 15.67 in Mathematical Statistics with Applications, 5th Ed., by Wackerly, Mendenhall, and Scheaffer. Also note that $T$ is defined as $T=\min(T^+,T^-),$ where $T^+=$ sum of the ranks of the positive differences and $T^-=$ sum of the ranks of the negative differences.

My Work So Far: If we were to examine the total rank sum, it would be equal to $n(n+1)/2.$ If the populations are
identical, then we would expect half of this total rank sum to go to $T^-,$ and the other half to go to
$T^+,$ making $E(T)=n(n+1)/4.$ A similar argument applies to $E(T^2),$ which we would expect to be
$$E(T^2)=\frac12\sum_{i=1}^ni^2=\frac{n(n+1)(2n+1)}{12}.$$
Then note that
\begin{align*}
V(T)
&=E(T^2)-(E(T))^2\\
&=\frac{n(n+1)(2n+1)}{12}-\frac{n^2(n+1)^2}{16}\\
&=\frac{n(n+1)(4+5n-3n^2)}{48},
\end{align*}
which is clearly not the desired result.

My Question: Where am I going wrong?

Best Answer

Suppose we take two measurements for each of the $n$ subjects, where each subject is independent of one another. Let $X_i$ and $Y_i$ denote these measurements for $i=1, \cdots, n$. Let $Z_i = Y_i - X_i$ and let $R_i$ denote the rank of $|Z_i|$. Assume that there are no ties.

The Wilcoxon signed-rank test statistic is defined as $T = \mbox{min}(T^{+}, T^{-})$. Since we have assumed no ties, $T^{-} = n(n+1)/2 - T^{+}$. Clearly, the variance of $T$ equals the variance of $T^{+}$ since $T^{-}$ is the difference of $T^{+}$ and a constant. The expectation of $T$ can also be shown to equal the expectation of $T^{+}$ under the null hypothesis.

For these types of problems, assumptions about the test statistic under the null hypothesis are not as illuminating as writing the test statistic as a function of random variables. Since the two populations are identical under the null hypothesis and no ties are allowed, we can treat the ranks $R_1, \cdots, R_n$ as known, but the signs of $Z_1, \cdots, Z_n$ as unknown. Let $\psi_i = \mbox{I} \left[Z_i > 0\right]$, where $\mbox{I} \left[\cdot \right]$ denotes the indicator function. Then we may write $T^{+} = \sum_{i=1}^n R_i \psi_i$. Under the null hypothesis, $\psi_i \sim Bernoulli(.5)$. Hence, \begin{eqnarray*} \mbox{E} \left[T^{+}\right] &=& \mbox{E} \left[\sum_{i=1}^n R_i \psi_i\right] \\ &=& \sum_{i=1}^n R_i \mbox{E} \left[\psi_i\right] \\ &=& \frac{1}{2} \sum_{i=1}^n i \\ &=& \frac{n(n+1)}{4}. \end{eqnarray*} Likewise, the variance of $T^{+}$ is \begin{eqnarray*} \mbox{Var} \left[T^{+}\right] &=& \mbox{Var} \left[\sum_{i=1}^n R_i \psi_i\right] \\ &=& \sum_{i=1}^n R_i^2 \mbox{Var} \left[\psi_i\right] \\ &=& \frac{1}{4}\sum_{i=1}^n i^2 \\ &=& \frac{n(n+1)(2n+1)}{24}. \end{eqnarray*}

Now your reasoning about the second raw moment of $T$, equivalently $T^{+}$, is incorrect. Perhaps another could comment on where you went amiss with your reasoning, but as I have stated it is important to write your test statistic as a function of random variables to avoid such mistakes. The correct derivation of the second moment is as follows \begin{eqnarray*} \mbox{E} \left[\left(T^{+}\right)^2\right] &=& \mbox{E} \left[\left(\sum_{i=1}^n R_i \psi_i\right)^2\right] \\ &=& \sum_{i=1}^n R_i^2 \mbox{E}\left[\psi_i^2\right] + \sum_{i=1}^n \sum_{\substack{j=1 \\ j \ne i}}^n R_i R_j \mbox{E} \left[\psi_i \psi_j \right] \\ &=& \frac{1}{2}\sum_{i=1}^n i^2 + \frac{1}{4} \sum_{i=1}^n \sum_{\substack{j=1 \\ j \ne i}}^n ij \\ &=& \frac{1}{2}\sum_{i=1}^n i^2 + \frac{1}{4} \sum_{i=1}^n i \left[\frac{n(n+1)}{2} - i\right] \\ &=& \frac{n(n+1)(2n+1)}{12} + \frac{n^2(n+1)^2}{16} - \frac{n(n+1)(2n+1)}{24} \\ &=& \frac{n(n+1)(n+2)(3n+1)}{48}. \end{eqnarray*}

Related Solutions

Solved – Error in scipy Wilcoxon signed-rank test

First, wilcoxon test in scipy.stats does NOT use $W$ as the test statics, it instead uses $T$ as defined in Siegel's popular book: Non-parametric statistics for the behavioral sciences. And yes, as @whuber correctly pointed out, once you know $T$ and sample size, $W$ is also defined (@whuber, strictly speaking, not quite, one also need to know how 0 differences are handled).

Only can only know how the test is implemented by reading the source code. For scipy, Wilcoxon test can be found in your_python_package_folde/scipy/stats/morestats.py. Compare to R's wilcox.test, it is very simple. Go over the code, and you will see that it is equivalent to having correct=FALSE, exact=FALSE, paired=TRUE flags on in R.

Python:

>>> from scipy import stats
>>> x1=[48,  7, 12, 11, 62, 93, 79, 53, 28, 49, 74, 59, 57, 62, 22,  8, 30, 11,  2, 47]
>>> x2=[20, 13, 41, 61, 93, 11, 28, 61, 26, 91, 95,  5, 80, 45, 88, 99, 50, 96, 69, 93]
>>> stats.wilcoxon(x1, x2) # T and p value, two-sided
(60.0, 0.092963126712486244)

in R:

> x1<-c(48,  7, 12, 11, 62, 93, 79, 53, 28, 49, 74, 59, 57, 62, 22,  8, 30, 11,  2, 47)
> x2<-c(20, 13, 41, 61, 93, 11, 28, 61, 26, 91, 95,  5, 80, 45, 88, 99, 50, 96, 69, 93)
> wilcox.test(x1,x2,correct=FALSE,exact=FALSE,paired=TRUE)

    Wilcoxon signed rank test

data:  x1 and x2 
V = 60, p-value = 0.09296
alternative hypothesis: true location shift is not equal to 0

Solved – Wilcoxon one-tailed signed rank test

You can get the one-tailed p-value just by dividing in half the two-tailed p-value. But keep in mind that it's generally not advisable to use one-tailed tests.

Best Answer

Related Solutions

Solved – Error in scipy Wilcoxon signed-rank test

Solved – Wilcoxon one-tailed signed rank test

Related Question