Solved – Prove the relationship between Walsh averages and Wilcoxon signed rank test

nonparametricself-studywilcoxon-signed-rank

In my lecture notes it is stated that

sum of all positive signed ranks (defined in Wilcoxon signed rank test) = the number of Walsh averages that are greater than median

How can I prove this statement?

Best Answer

For clarity: The sample Walsh averages are the pairwise averages $(x_i+x_j)/2$, $i=1,2,...n,$ $j=1,...,i$. The median of the Walsh averages is the Hodges-Lehmann estimator, also called the the pseudo-median.

Here's some (hopefully useful) hints to get you started -- which is basically one of several ways to just arrange the calculation methodically, but it makes it easier to see the connections between the two calculations.

Here the observations are from a single sample (in the case of a paired test the observations are the pair-differences, at which point we're dealing with a single sample of pair-differences). Further assume that the $X$'s are continuous (so for example, there are no tied ranks and no Walsh-averages at 0)

Consider without loss of generality that we're comparing to a specified median of zero.

Let $X_i = S_i M_i$ where $M_i=|X_i|$ and $S_i = \mathop{\mathrm{sgn}}(X_i)$ (and similarly for $j$, when we deal with pairs of observations).

Then let $R_i = \mathop{\mathrm{rank}}(M_i)$. We now have some notation in place to describe the basic components of the signed rank test.

Now order the $X$-values from smallest magnitude to largest magnitude (i.e. sort them by the $M$-values).

(It helps to play with a small numerical example. Consider data values $1.0, -2.4, 3.6$ say -- these have deliberately been ordered as just described)

Write an $n \times n$ table with row and column headings being the ordered $X_i$ values (you may like to write a small $(S_i,R_i)$ under the columns).

Inside the table, for values on or above the main diagonal, put a "+" if the Walsh average of the corresponding row- and column- $X$-values is above 0 and "-" if it's below it. For each column, count how many "+" values there are.

Note that looking down a column we're just seeing $X_i$ compared with each other value that is no larger in magnitude than $X_i$ (i.e. with each observation to it's left in the ordered list, plus itself).

For the column labelled $X_i$, the count will either be positive or $0$. What determines which of the two things it is? Now note the connection between the column "+" count and $R_i$.

Hopefully you should be able to work your way to an argument from that.


Here's the example I mentioned above. Note that if X's are 1.0, -2.4, 3.6 then the signed ranks are +1, -2 and +3:

           1.0  -2.4   3.6
     1.0    +     -      +
    -2.4          -      +
     3.6                 +

    (S,R)  +,1   -,2    +,3

It should be clear that if $S_i$ is $-1$ then there are no "+" terms in the column of positive Walsh-averages, but if $S_i=+1$ then there are $R_i$ positive Walsh averages. What you need to do is make this observation a bit more formal and then argue a small step to the needed result from that.