There is a chapter on the Kruskal-Wallis (KW) test on the website influentianl points, and there are some quotes I'm not sure I understand correctly:
Quote 1:
Some authors state unambiguously that there are no distributional
assumptions, others that the homogeneity of variances assumption
applies […]
If you wish to compare medians or means, then the Kruskal-Wallis test
also assumes that observations in each group are identically and
independently distributed apart from location. If you can accept
inference in terms of dominance of one distribution over another, then
there are indeed no distributional assumptions.
[link to chapter]
Quote 2:
…heterogeneous variances will make interpretation of the result more
complex…
[link to chapter]
My questions:
- For instance, I analyze dataset
chickwts
which is included in baseR
software (below I included a boxplot of the data) and, say, it meets all required assumptions. How (in practical terms from biologist's point of view) interpretation of Kruskal-Wallis test results changes, if I carry out the KW test as a test for medians and if I run it as a test for stochastic dominance? What can I conclude from the data in both cases? - From the quote 2 I imply, I should carry out Levene's/Brown-Forsythe test to check for heteroscedasticity. Am I right? If yes, how the result of Levene's test influences the interpretation of Kruskal-Wallis test?
- Should I carry out other statistical tests (e.g., Kolmogorov-Smirnov test) or make a special type of plots (e.g., QQ plot for each pair of groups) to check if distributions of data in each group have approximately the same shape?
The dataset:
data(chickwts)
boxplot(weight~feed, data = chickwts, las = 3)
Best Answer
The KW test (also the Mann-Whitney U-test) is essentially always a test for stochastic dominance. What that means is it is testing to see if there exists at least one group such that you would typically get a larger (lesser) value from it than the rest if you drew a value at random from each.
People assume this means that one median or mean must be greater than the other, but that isn't necessarily true. If the shapes and the variances of the distributions are identical (i.e., one group's distribution is just shifted up or down relative to the other), then stochastic dominance implies a greater mean and median (and also a greater third quartile, fifth percentile, etc.). However, if the shapes / variances of the distributions differ, then it isn't necessarily the case. For further discussion of these topics and to see an example where the means are switched, see my answer here: Wilcoxon-Mann-Whitney test giving surprising results. For an example where the medians are equal, but there is nonetheless a stochastically dominant group, consider this:
With this understanding in mind, we can answer your specific questions.