Solved – getting inconsistent results from Wilcoxon signed rank test and Kruskal-Wallis test

hypothesis testingkruskal-wallis test”nonparametricrepeated measureswilcoxon-signed-rank

I am doing a study on 3 drugs, comparing response pre-post treatment. My objective is to know if these drugs are effective and which one is better. I used non parametric tests since the results weren't normally distributed and were non transformable. Two drugs were effective with a significant difference on the Wilcoxon signed test and a third showed no significant difference. Yet when comparing the post-results using the Kruskal-Wallis test, no significant difference was observed and the pre-results showed non significant differences between these groups using the Kruskal-Wallis test as well. Why is that? Did I choose the wrong tests?

Best Answer

What you should always keep in mind, is that The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant -- there is a nice paper with this title by Gelman and Stern, which I link to here, but the idea is very simple. Here is how they start explaining it:

Consider two independent studies with effect estimates and standard errors of 25±10 and 10±10. The first study is statistically significant at the 1% level, and the second is not at all statistically significant, being only one standard error away from 0. Thus, it would be tempting to conclude that there is a large difference between the two studies. In fact, however, the difference is not even close to being statistically significant: the estimated difference is 15, with a standard error of $\sqrt{10^2+10^2}=14$.

In your case, when you conduct three separate Wilcoxon tests, you might get p-values e.g. $0.045$, $0.045$, and $0.055$. First two are "significant" according to the common $p<0.05$ criterion, and the third one is not. However, the difference between p-values is tiny, and so it is very well possible that if you compare three groups between each other, then you will fail to get any significant difference. Which seems to be exactly your case.

In addition: doing Kruskal-Wallis on the pre and post measures separately is probably not the best approach. You can subtract pre from post and do one Kruskal-Wallis on these differences. It is of course still possible (as I explain above) that you will not get a significant difference, but this is a more correct approach.

Just to stress it again: if one drug comes out with significant pre-post difference and another one with insignificant, it is (by itself) no reason whatsoever to believe that one drug is better than another. Unfortunately, it is a very widespread mistake.

Related Question