ANOVA – How to Interpret the P-value in a Dunn-test?

anovadunn-testkruskal-wallis test”multiple-comparisonsnonparametric

I conducted a Kruskal Wallis test for my sample sizes, and this is what I got:

Kruskal-Wallis chi-squared = 12.138, df = 4, p-value = 0.01635

The p-value is below 0.05, so I know there is a significant difference among my compared groups. So I did a dunn-test to find the differences:

dunnTest(personal.value ~ income,
data=ordinal,
method="bh")

And this is what I got:

enter image description here

Yet, none of the adjusted p-values are below 0.05. Does that mean I can accept the null hypothesis? How do I interpret this?

Best Answer

The main effect for a factor may be significant at the 5% level, but the method of ad hoc testing (to avoid 'false discovery' in multiple analyses of the same data) may use a somewhat different criterion for judging differences than does the main test. In relatively rare cases, that can lead to your situation where none of the ad hoc comparisons between any two levels of a 'significant' factor turns out to be judged significant.

IMHO, at the very least, you ought to be able to claim that the largest difference in sample means among your five levels is significant at 5%, which is a bit higher than the significance level of the effect. Is that the comparison of level 2 vs 4? Even the Dunn 'adjusted P-value' for that comparison is significant at the 6% level. (There is nothing 'sacred' about the 5% level.)

By looking at all ${5 \choose 2} = 10$ ad hoc comparisons among levels of this factor you may be paying a penalty with adjusted P-values larger than than necessary to avoid false discovery.

By contrast, if you were to break the rules, doing ad hoc comparisons for an effect that is not quite significant at the 5% level, you might occasionally find comparison among levels that is "significant" at an adjusted significance level of 5%. Again, that might be because main and ad hoc use slightly different criteria. Fortunately, that discrepancy is not often noticed because most people know not to do comparisons among levels unless the main effect is found to be significant.

Furthermore, there is no guarantee that you will be able definitively to rank the levels of a factor. For example, if you have five levels of a factor, you might establish that smallest level 4 and largest level 5 are significantly different at the 5% level, but not be able to tell whether intermediate levels 1, 2, and 3 differ from 4 or from 5 or from one and other. Happily, the differences that cannot be resolved may often be too small to be of practical importance.

In general, one can decrease the possibility of failing to distinguish important differences by doing a power and sample size analysis at the start of the study to make sure that there are enough replications per level to resolve differences large enough to be of practical interest.