Wilcoxon Signed-Rank Test – Assumptions and Null Hypothesis (H0) for Nonparametric Testing

nonparametricwilcoxon-signed-rank

I am working on the assumption page of the Wilcoxon signed-rank test (the wilcox "paired test") in Wikipedia.

I was able to locate a reference for the assumption that I wrote there, which are:

Let $Z_i=X_i – Y_i$ for $i = 1, \ldots , n$.

The differences $Z_i$ are assumed to be independent.
Each $Z_i$ comes from
the same continuous population, and is symmetric about a common
median θ .
The values which $X_i$ and $Y_i$ represent are ordered (at
least the ordinal level of measurement), so the comparisons
"greater than", "less than", and "equal to" are useful.

However, I get this nagging feeling that I am missing something here. Can anyone correct/expand on this?

Thanks.

Best Answer

Assumption 1 is needed. Assumption 3 is not strong enough. You need X and Y to be on scales that make differences orderable, which can mean that X and Y are interval scaled. Regarding the distributional assumption this depends on how you state the hypothesis. If you want to make an inference about the mean difference (and perhaps about the median?) then you assume the distribution of the differences is symmetric. If you want to test the hypothesis that the probability that the sum of a randomly chosen pair of differences exceeds zero is 0.5 then no distributional assumption is needed.

Related Solutions

Solved – Help with understanding the assumptions of Wilcoxon signed rank test

The expectation is that there will be dependence within pairs $(x_i,y_i)$, but this is not actually a requirement -- the test will work correctly whether this is true or not. The test is applied to the pair-differences $d_i =y_i-x_i$; if there's positive dependence, taking account of this pairing by taking differences is helpful in reducing variation.
There is assumed to be independence between those differences $d_i$ is independent of $d_j$. This is unlikely to be true of time series.

Continuous dependent variable – Although the Wilcoxon signed rank test ranks the differences according to their size and is therefore a non-parametric test, it assumes that the measurements are continuous

If they're not, the tabled distribution doesn't apply and the test will depend on the pattern of ties.

To account for the fact that in most cases the dependent variable is binomially distributed, a continuity correction is applied.

This makes no sense to me. How would a continuity correction deal with the problem? In large samples you could retain a normal approximation but use a variance that takes account of the pattern of ties, and in smaller samples you'd attempt to compute or simulate from the permutation distributon.

Solved – Prove the relationship between Walsh averages and Wilcoxon signed rank test

For clarity: The sample Walsh averages are the pairwise averages $(x_i+x_j)/2$, $i=1,2,...n,$ $j=1,...,i$. The median of the Walsh averages is the Hodges-Lehmann estimator, also called the the pseudo-median.

Here's some (hopefully useful) hints to get you started -- which is basically one of several ways to just arrange the calculation methodically, but it makes it easier to see the connections between the two calculations.

Here the observations are from a single sample (in the case of a paired test the observations are the pair-differences, at which point we're dealing with a single sample of pair-differences). Further assume that the $X$'s are continuous (so for example, there are no tied ranks and no Walsh-averages at 0)

Consider without loss of generality that we're comparing to a specified median of zero.

Let $X_i = S_i M_i$ where $M_i=|X_i|$ and $S_i = \mathop{\mathrm{sgn}}(X_i)$ (and similarly for $j$, when we deal with pairs of observations).

Then let $R_i = \mathop{\mathrm{rank}}(M_i)$. We now have some notation in place to describe the basic components of the signed rank test.

Now order the $X$-values from smallest magnitude to largest magnitude (i.e. sort them by the $M$-values).

(It helps to play with a small numerical example. Consider data values $1.0, -2.4, 3.6$ say -- these have deliberately been ordered as just described)

Write an $n \times n$ table with row and column headings being the ordered $X_i$ values (you may like to write a small $(S_i,R_i)$ under the columns).

Inside the table, for values on or above the main diagonal, put a "+" if the Walsh average of the corresponding row- and column- $X$-values is above 0 and "-" if it's below it. For each column, count how many "+" values there are.

Note that looking down a column we're just seeing $X_i$ compared with each other value that is no larger in magnitude than $X_i$ (i.e. with each observation to it's left in the ordered list, plus itself).

For the column labelled $X_i$, the count will either be positive or $0$. What determines which of the two things it is? Now note the connection between the column "+" count and $R_i$.

Hopefully you should be able to work your way to an argument from that.

Here's the example I mentioned above. Note that if X's are 1.0, -2.4, 3.6 then the signed ranks are +1, -2 and +3:

           1.0  -2.4   3.6
     1.0    +     -      +
    -2.4          -      +
     3.6                 +

    (S,R)  +,1   -,2    +,3

It should be clear that if $S_i$ is $-1$ then there are no "+" terms in the column of positive Walsh-averages, but if $S_i=+1$ then there are $R_i$ positive Walsh averages. What you need to do is make this observation a bit more formal and then argue a small step to the needed result from that.

Best Answer

Related Solutions

Solved – Help with understanding the assumptions of Wilcoxon signed rank test

Solved – Prove the relationship between Walsh averages and Wilcoxon signed rank test

Related Question