Wilcoxon Signed-Rank – Understanding the Variance of Wilcoxon Signed-Rank Statistic

mathematical-statisticsself-studyvariancewilcoxon-signed-rank

Problem Statement: Let $T$ denote the Wilcoxon signed-rank test for $n$ pairs of observations. Show that
$E(T)=(1/4)n(n+1)$ and $V(T)=(1/24)[n(n+1)(2n+1)]$ when the two populations are identical.

Note: this is Exercise 15.67 in Mathematical Statistics with Applications, 5th Ed., by Wackerly, Mendenhall, and Scheaffer. Also note that $T$ is defined as $T=\min(T^+,T^-),$ where $T^+=$ sum of the ranks of the positive differences and $T^-=$ sum of the ranks of the negative differences.

My Work So Far: If we were to examine the total rank sum, it would be equal to $n(n+1)/2.$ If the populations are
identical, then we would expect half of this total rank sum to go to $T^-,$ and the other half to go to
$T^+,$ making $E(T)=n(n+1)/4.$ A similar argument applies to $E(T^2),$ which we would expect to be
$$E(T^2)=\frac12\sum_{i=1}^ni^2=\frac{n(n+1)(2n+1)}{12}.$$
Then note that
\begin{align*}
V(T)
&=E(T^2)-(E(T))^2\\
&=\frac{n(n+1)(2n+1)}{12}-\frac{n^2(n+1)^2}{16}\\
&=\frac{n(n+1)(4+5n-3n^2)}{48},
\end{align*}

which is clearly not the desired result.

My Question: Where am I going wrong?

Best Answer

Suppose we take two measurements for each of the $n$ subjects, where each subject is independent of one another. Let $X_i$ and $Y_i$ denote these measurements for $i=1, \cdots, n$. Let $Z_i = Y_i - X_i$ and let $R_i$ denote the rank of $|Z_i|$. Assume that there are no ties.

The Wilcoxon signed-rank test statistic is defined as $T = \mbox{min}(T^{+}, T^{-})$. Since we have assumed no ties, $T^{-} = n(n+1)/2 - T^{+}$. Clearly, the variance of $T$ equals the variance of $T^{+}$ since $T^{-}$ is the difference of $T^{+}$ and a constant. The expectation of $T$ can also be shown to equal the expectation of $T^{+}$ under the null hypothesis.

For these types of problems, assumptions about the test statistic under the null hypothesis are not as illuminating as writing the test statistic as a function of random variables. Since the two populations are identical under the null hypothesis and no ties are allowed, we can treat the ranks $R_1, \cdots, R_n$ as known, but the signs of $Z_1, \cdots, Z_n$ as unknown. Let $\psi_i = \mbox{I} \left[Z_i > 0\right]$, where $\mbox{I} \left[\cdot \right]$ denotes the indicator function. Then we may write $T^{+} = \sum_{i=1}^n R_i \psi_i$. Under the null hypothesis, $\psi_i \sim Bernoulli(.5)$. Hence, \begin{eqnarray*} \mbox{E} \left[T^{+}\right] &=& \mbox{E} \left[\sum_{i=1}^n R_i \psi_i\right] \\ &=& \sum_{i=1}^n R_i \mbox{E} \left[\psi_i\right] \\ &=& \frac{1}{2} \sum_{i=1}^n i \\ &=& \frac{n(n+1)}{4}. \end{eqnarray*} Likewise, the variance of $T^{+}$ is \begin{eqnarray*} \mbox{Var} \left[T^{+}\right] &=& \mbox{Var} \left[\sum_{i=1}^n R_i \psi_i\right] \\ &=& \sum_{i=1}^n R_i^2 \mbox{Var} \left[\psi_i\right] \\ &=& \frac{1}{4}\sum_{i=1}^n i^2 \\ &=& \frac{n(n+1)(2n+1)}{24}. \end{eqnarray*}

Now your reasoning about the second raw moment of $T$, equivalently $T^{+}$, is incorrect. Perhaps another could comment on where you went amiss with your reasoning, but as I have stated it is important to write your test statistic as a function of random variables to avoid such mistakes. The correct derivation of the second moment is as follows \begin{eqnarray*} \mbox{E} \left[\left(T^{+}\right)^2\right] &=& \mbox{E} \left[\left(\sum_{i=1}^n R_i \psi_i\right)^2\right] \\ &=& \sum_{i=1}^n R_i^2 \mbox{E}\left[\psi_i^2\right] + \sum_{i=1}^n \sum_{\substack{j=1 \\ j \ne i}}^n R_i R_j \mbox{E} \left[\psi_i \psi_j \right] \\ &=& \frac{1}{2}\sum_{i=1}^n i^2 + \frac{1}{4} \sum_{i=1}^n \sum_{\substack{j=1 \\ j \ne i}}^n ij \\ &=& \frac{1}{2}\sum_{i=1}^n i^2 + \frac{1}{4} \sum_{i=1}^n i \left[\frac{n(n+1)}{2} - i\right] \\ &=& \frac{n(n+1)(2n+1)}{12} + \frac{n^2(n+1)^2}{16} - \frac{n(n+1)(2n+1)}{24} \\ &=& \frac{n(n+1)(n+2)(3n+1)}{48}. \end{eqnarray*}

Related Question