Solved – Dunn’s test: is it appropriate to use p-values to interpret relative between group similarity

dunn-testinterpretationkruskal-wallis test”p-value

For a set of data that I collected, I used the Kruskal-Wallace test to determine whether there were between-group differences in the parameter I measured. It returned a significant result, so in order to determine pairwise differences between my five treatment groups, I applied Dunn's test for multiple comparisons, which yielded these results:

Table summarizing Dunn's multiple comparisons test results.

I would normally use a cut off of p<0.05 to determine significance, but because of the multiple comparisons I used a Bonferroni correction. Five groups were compared (AB, Control, SM, DSV, and SW), resulting in ten pairwise comparisons. Thus I assessed significant when p was less than 0.05/10(=0.005). For the table a * indicates significance at the <0.05 level, and ** at the corrected (<0.005) level.

I can tell from the table which groups have statistically different rank means from one another (i.e. the groups are different).

My question: is it appropriate to evaluate relative between group differences based on the magnitude of the p-values? For example, is it appropriate to say, based on the data provided, that the AB-Control groups are the most similar to one another? And, Control-SM have the second highest degree of similarity out of all the comparisons made here?

Best Answer

From my interpretation of what you are asking, I would say no, you should not compare the magnitude of the differences based on the p-values alone.

The z scores for the comparisons are dependent on the mean differences in ranks as you say, but also on the sample size for each group (and the p-values on the degrees of freedom for your comparisons). There is a good community wiki on Dunn's test here with more info.

Therefore the p values are not just determined by the actual differences between the groups, but also the sampling effort that you put in for each treatment group. So groups for which there was less sampling effort would have less significant p-values, for the same sample mean rank difference.

I suggest you calculate confidence intervals for your mean rank differences, and compare those. This will make explicit the greater uncertainty in the size of the difference for any groups with smaller sample sizes.