Solved – Kruskal-Wallis test: assumption testing and interpretation of the results

assumptionsheteroscedasticityinterpretationkruskal-wallis test”nonparametric

There is a chapter on the Kruskal-Wallis (KW) test on the website influentianl points, and there are some quotes I'm not sure I understand correctly:

Quote 1:

Some authors state unambiguously that there are no distributional
assumptions, others that the homogeneity of variances assumption
applies […]
If you wish to compare medians or means, then the Kruskal-Wallis test
also assumes that observations in each group are identically and
independently distributed apart from location. If you can accept
inference in terms of dominance of one distribution over another, then
there are indeed no distributional assumptions.
[link to chapter]

Quote 2:

…heterogeneous variances will make interpretation of the result more
complex…
[link to chapter]

My questions:

  1. For instance, I analyze dataset chickwts which is included in base R software (below I included a boxplot of the data) and, say, it meets all required assumptions. How (in practical terms from biologist's point of view) interpretation of Kruskal-Wallis test results changes, if I carry out the KW test as a test for medians and if I run it as a test for stochastic dominance? What can I conclude from the data in both cases?
  2. From the quote 2 I imply, I should carry out Levene's/Brown-Forsythe test to check for heteroscedasticity. Am I right? If yes, how the result of Levene's test influences the interpretation of Kruskal-Wallis test?
  3. Should I carry out other statistical tests (e.g., Kolmogorov-Smirnov test) or make a special type of plots (e.g., QQ plot for each pair of groups) to check if distributions of data in each group have approximately the same shape?

The dataset:

data(chickwts)
boxplot(weight~feed, data = chickwts, las = 3)

enter image description here

Best Answer

The KW test (also the Mann-Whitney U-test) is essentially always a test for stochastic dominance. What that means is it is testing to see if there exists at least one group such that you would typically get a larger (lesser) value from it than the rest if you drew a value at random from each.

People assume this means that one median or mean must be greater than the other, but that isn't necessarily true. If the shapes and the variances of the distributions are identical (i.e., one group's distribution is just shifted up or down relative to the other), then stochastic dominance implies a greater mean and median (and also a greater third quartile, fifth percentile, etc.). However, if the shapes / variances of the distributions differ, then it isn't necessarily the case. For further discussion of these topics and to see an example where the means are switched, see my answer here: Wilcoxon-Mann-Whitney test giving surprising results. For an example where the medians are equal, but there is nonetheless a stochastically dominant group, consider this:

g1 = c(rep(0, 11), 1:10)                # group 1 has 11 0s, & then 1 to 10
g2 <- g3 <- g4<- c(-10:-1, rep(0, 11))  # the other groups have 11 0s, & -1 to -10
d  = stack(list(g1=g1, g2=g2, g3=g3, g4=g4))
aggregate(values~ind, d, median)        # the median of every group is 0
#   ind values
# 1  g1      0
# 2  g2      0
# 3  g3      0
# 4  g4      0

enter image description here

kruskal.test(values~ind, d)  # the KW test is highly significant nonetheless
#   Kruskal-Wallis rank sum test
# 
# data:  values by ind
# Kruskal-Wallis chi-squared = 28.724, df = 3, p-value = 2.559e-06

With this understanding in mind, we can answer your specific questions.

  1. If the distributions within each group (of chicks) / condition (feed type) have the same shape and variance, a significant KW test implies there is at least one group that is stochastically greater (lesser) than the others, and its mean (and median, and first quartile, and eighty-eighth percentile, etc.) is higher (lower) than the other groups. If the distributions differ in shape and/or variance, a significant KW test implies there is at least one group that is stochastically greater (lesser) than the others, but its mean (and median, and first quartile, and eighty-eighth percentile, etc.) is not necessarily higher (lower) than the other groups.
  2. I would not bother running Levene's test before KW.
  3. I would not bother running the Kolmogorov-Smirnov test before KW. Examining qq-plots seems reasonable.
Related Question