Solved – Prove the relationship between Walsh averages and Wilcoxon signed rank test

nonparametricself-studywilcoxon-signed-rank

In my lecture notes it is stated that

sum of all positive signed ranks (defined in Wilcoxon signed rank test) = the number of Walsh averages that are greater than median

How can I prove this statement?

Best Answer

For clarity: The sample Walsh averages are the pairwise averages $(x_i+x_j)/2$, $i=1,2,...n,$ $j=1,...,i$. The median of the Walsh averages is the Hodges-Lehmann estimator, also called the the pseudo-median.

Here's some (hopefully useful) hints to get you started -- which is basically one of several ways to just arrange the calculation methodically, but it makes it easier to see the connections between the two calculations.

Here the observations are from a single sample (in the case of a paired test the observations are the pair-differences, at which point we're dealing with a single sample of pair-differences). Further assume that the $X$'s are continuous (so for example, there are no tied ranks and no Walsh-averages at 0)

Consider without loss of generality that we're comparing to a specified median of zero.

Let $X_i = S_i M_i$ where $M_i=|X_i|$ and $S_i = \mathop{\mathrm{sgn}}(X_i)$ (and similarly for $j$, when we deal with pairs of observations).

Then let $R_i = \mathop{\mathrm{rank}}(M_i)$. We now have some notation in place to describe the basic components of the signed rank test.

Now order the $X$-values from smallest magnitude to largest magnitude (i.e. sort them by the $M$-values).

(It helps to play with a small numerical example. Consider data values $1.0, -2.4, 3.6$ say -- these have deliberately been ordered as just described)

Write an $n \times n$ table with row and column headings being the ordered $X_i$ values (you may like to write a small $(S_i,R_i)$ under the columns).

Inside the table, for values on or above the main diagonal, put a "+" if the Walsh average of the corresponding row- and column- $X$-values is above 0 and "-" if it's below it. For each column, count how many "+" values there are.

Note that looking down a column we're just seeing $X_i$ compared with each other value that is no larger in magnitude than $X_i$ (i.e. with each observation to it's left in the ordered list, plus itself).

For the column labelled $X_i$, the count will either be positive or $0$. What determines which of the two things it is? Now note the connection between the column "+" count and $R_i$.

Hopefully you should be able to work your way to an argument from that.

Here's the example I mentioned above. Note that if X's are 1.0, -2.4, 3.6 then the signed ranks are +1, -2 and +3:

           1.0  -2.4   3.6
     1.0    +     -      +
    -2.4          -      +
     3.6                 +

    (S,R)  +,1   -,2    +,3

It should be clear that if $S_i$ is $-1$ then there are no "+" terms in the column of positive Walsh-averages, but if $S_i=+1$ then there are $R_i$ positive Walsh averages. What you need to do is make this observation a bit more formal and then argue a small step to the needed result from that.

Related Solutions

Wilcoxon Signed-Rank Test – Assumptions and Null Hypothesis (H0) for Nonparametric Testing

Assumption 1 is needed. Assumption 3 is not strong enough. You need X and Y to be on scales that make differences orderable, which can mean that X and Y are interval scaled. Regarding the distributional assumption this depends on how you state the hypothesis. If you want to make an inference about the mean difference (and perhaps about the median?) then you assume the distribution of the differences is symmetric. If you want to test the hypothesis that the probability that the sum of a randomly chosen pair of differences exceeds zero is 0.5 then no distributional assumption is needed.

Solved – Error in scipy Wilcoxon signed-rank test

First, wilcoxon test in scipy.stats does NOT use $W$ as the test statics, it instead uses $T$ as defined in Siegel's popular book: Non-parametric statistics for the behavioral sciences. And yes, as @whuber correctly pointed out, once you know $T$ and sample size, $W$ is also defined (@whuber, strictly speaking, not quite, one also need to know how 0 differences are handled).

Only can only know how the test is implemented by reading the source code. For scipy, Wilcoxon test can be found in your_python_package_folde/scipy/stats/morestats.py. Compare to R's wilcox.test, it is very simple. Go over the code, and you will see that it is equivalent to having correct=FALSE, exact=FALSE, paired=TRUE flags on in R.

Python:

>>> from scipy import stats
>>> x1=[48,  7, 12, 11, 62, 93, 79, 53, 28, 49, 74, 59, 57, 62, 22,  8, 30, 11,  2, 47]
>>> x2=[20, 13, 41, 61, 93, 11, 28, 61, 26, 91, 95,  5, 80, 45, 88, 99, 50, 96, 69, 93]
>>> stats.wilcoxon(x1, x2) # T and p value, two-sided
(60.0, 0.092963126712486244)

in R:

> x1<-c(48,  7, 12, 11, 62, 93, 79, 53, 28, 49, 74, 59, 57, 62, 22,  8, 30, 11,  2, 47)
> x2<-c(20, 13, 41, 61, 93, 11, 28, 61, 26, 91, 95,  5, 80, 45, 88, 99, 50, 96, 69, 93)
> wilcox.test(x1,x2,correct=FALSE,exact=FALSE,paired=TRUE)

    Wilcoxon signed rank test

data:  x1 and x2 
V = 60, p-value = 0.09296
alternative hypothesis: true location shift is not equal to 0

Best Answer

Related Solutions

Wilcoxon Signed-Rank Test – Assumptions and Null Hypothesis (H0) for Nonparametric Testing

Solved – Error in scipy Wilcoxon signed-rank test

Related Question