- If you don't have ties, I would report the proportion of after values that are less than the corresponding before values.
- If you do have ties, you could report the proportion of after values that are less than before out of the total number of non-tied pairs, or report all three proportions (<, =, >) and perhaps the sum of whichever two were more meaningful. For example, you could say '33% had less fear of statistics, 57% were unchanged, and 10% had more fear after the course such that 90% were the same as or better than before'.
Generally speaking, a hypothesis test will output a p-value that can be used to make a decision about whether or not to reject the null hypothesis while controlling for the type I error rate. The p-value, however, conflates the size of the effect with our amount of clarity that it is inconsistent with the null (in essence, how much data the test had access to). An effect size generally tries to extract the $N$ so as to isolate the magnitude of the effect. That line of reasoning illuminates the rationale behind dividing $z$ by $\sqrt N$. However, a major consideration with effect size measures is interpretability. Most commonly that consideration plays out in choosing between a raw effect size or a standardized effect size. (I suppose we could call $z/\sqrt N$ a standardized effect size, for what that's worth.) At any rate, my guess is that reporting $z/\sqrt N$ won't give people a quick, straightforward intuition into your effect.
There is another wrinkle, though. While you want an estimate of the size of the overall effect, people typically use the Wilcoxon signed rank test with data that are only ordinal. That is, where they don't trust that the data can reliably indicate the magnitude of the shift within a student, but only that a shift occurred. That brings me to the proportion improved discussed above.
On the other hand, if you do trust that the values are intrinsically meaningful (e.g., you only used the signed rank test for its robustness to normality and outliers), you could just use a raw mean or median difference, or the standardized mean difference as a measure of effect.
You can use the qsignrank()
function. Example:
> qsignrank(.025, 10, lower.tail=FALSE)
46
This means that for a sample size of 10 and a two-sided test with a significance level of 5%, the test statistic must be greater than 46 (i.e., 47 or greater) to be statistically significant. Example data:
> set.seed(1)
> x = rnorm(10, .5)
> wilcox.test(x)
Wilcoxon signed rank test
data: x
V = 47, p-value = 0.04883
alternative hypothesis: true location is not equal to 0
Here the test statistic is 47, and significant at the 5% level.
Note that for a two-sided test, the test statistic returned by qsignrank()
is the larger of the two possible test statistics. For example, wilcox.test(-x)
gives a test statistic of 8, which can be transformed into 47 by $\frac{10\cdot 11}{2}-8$.
Best Answer
The Hodges-Lehmann statistic is the estimator associated with the Wilcoxon signed-rank test. Form the $n(n+1)/2$ pairwise averages $(x_i + x_j)/2$ for $i \leq j = 1, \dots, n$; take the median; and there you are.
I strongly suspect there's a procedure in SPSS for it, although a quick search of the web didn't turn up anything, and in R there's the exactRankTests package with
wilcox.test
in it, which function has an option that will return both the point estimate and a confidence interval.Edit: I notice that although exactRankTests is still available, it's not being developed any more and the coin package is recommended instead. It, too, has
wilcox.test
, and the syntax looks the same.