Solved – How to plot the results of a Kruskal-Wallis or a Welch’s ANOVA

anovadata visualizationkruskal-wallis test”nonparametric

Let's assume that I have two datasets for which I have measured three continuous variables. In both datasets, these variables have been measured on three groups of observations and these groups only contains 5 observations each.
I would like to know if, among these two sets of groups, some tend to have higher values for the measured variables. As my sample sizes are small and as these observations appear to be non-normaly distributed, I would like to use a non-parametric method. Consequently, I use a Kruskal-Wallis test on the first dataset, and Welch's ANOVA on the second one (because some data are heteroscedastic).
My question is what is the appropriate way to plot the results of a Kruskal-Wallis test? And what is the correct way to plot the results of a Welch's ANOVA?

Most people seem to use boxplots for such a purpose. Yet, if I understand correctly, the Kruskal-Wallis test does not exactly compare medians (but mean ranks), and boxplots do not give any information on the means of heteroscedastic samples (and are thus ill-suited for Welch's ANOVA results). So what should I use to plot my results?
On someone's blog, I read about plotting the groups' distributions. But since I only have very small sample sizes, I fear that it would not be very clear and/or informative (of course, I realize that with only 5 observations per group, my tests are not very informative themselves).

Best Answer

As you already mentioned the Kruskal-Wallis test is a test of significance based on the ranks. In my opinion however, plotting the ranks isn't really that helpful for the reader in order to understand the underlying response variable. Instead, what I would do is to plot the individual data points (including the median for descriptive purposes) plus the ranks as differently colored points. To make it clear, you could also place the letters indicating significant difference next to those points indicating the ranks. You can also obviously report everything you don't want to plot (e.g. the ranks as separate points, etc.) in a separate table (see example below).

I am not sure which software package you are using but below is an example using R to illustrate what I mentioned above (note: this approach may not look nice if the numerical values of the data points and the ranks are largely different. In that case, I would plot the data points and the significant differences via letters, and report the ranks in a separate table.

### required packages
require(tidyverse)
#> Loading required package: tidyverse
require(agricolae)
#> Loading required package: agricolae
### set seed for reproducibility
set.seed(564)

### subset the PlantGrowth dataset (available in R) to replicate your n=5 scenario
PlantGrowth %>% 
  group_by(group) %>% 
  slice(sample(1:5)) -> d_sub

### run Kruskal test from the agricolae package
k <- kruskal(d_sub$weight, d_sub$group, console = TRUE)
#> 
#> Study: d_sub$weight ~ d_sub$group
#> Kruskal-Wallis test's
#> Ties or no Ties
#> 
#> Critical Value: 3.290877
#> Degrees of freedom: 2
#> Pvalue Chisq  : 0.192928 
#> 
#> d_sub$group,  means of the ranks
#> 
#>      d_sub.weight r
#> ctrl          8.3 5
#> trt1          5.3 5
#> trt2         10.4 5
#> 
#> Post Hoc Analysis
#> 
#> t-Student: 2.178813
#> Alpha    : 0.05
#> Minimum Significant Difference: 5.816519 
#> 
#> Treatments with the same letter are not significantly different.
#> 
#>      d_sub$weight groups
#> trt2         10.4      a
#> ctrl          8.3      a
#> trt1          5.3      a

### create summary table incl. mean rank sums and significant differences letters
(t_comp <- k$means %>% 
    rownames_to_column(var = "group") %>%
    rename(weight = d_sub.weight) %>%
    as_tibble() %>% 
    left_join(as_tibble(k$groups), by = c("rank" = "d_sub$weight")))
#> # A tibble: 3 x 11
#>   group weight  rank   std     r   Min   Max   Q25   Q50   Q75 groups
#>   <chr>  <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> 
#> 1 ctrl    5.11   8.3 0.788     5  4.17  6.11  4.5   5.18  5.58 a     
#> 2 trt1    4.57   5.3 0.851     5  3.59  5.87  4.17  4.41  4.81 a     
#> 3 trt2    5.57  10.4 0.446     5  5.12  6.31  5.37  5.5   5.54 a

### create plot with ranks as blue dots and align the letters next to them
d_sub %>% 
  ggplot(aes(x = group, y = weight)) +
  geom_point(color = "grey50", size = 2) +
  # add ranks as separate points
  geom_point(data = t_comp, aes(x = group, y = rank), col = "blue", size = 3) + 
  # add median as horizontal line
  stat_summary(fun.y = median, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
               width = .75, col = "red") +
  # add letters
  geom_text(data = t_comp, aes(x = group, y = rank, label = groups), size = 6, nudge_x = -0.1)

^{Created on 2020-01-30 by the reprex package (v0.3.0)}

Related Solutions

Kruskal-Wallis Test – Data Considerations for Kruskal-Wallis Test

With small, and possibly unequal group sizes, I'd go with chl's and onestop's suggestion and do a Monte-Carlo permutation test. For the permutation test to be valid, you need exchangeability under $H_{0}$. If all distributions have the same shape (and are therefore identical under $H_{0}$), this is true.

Here's a first try at looking at the case of 3 groups and no ties. First, let's compare the asymptotic $\chi^{2}$ distribution function against a MC-permutation one for given group sizes (this implementation will break for larger group sizes).

P  <- 3                     # number of groups
Nj <- c(4, 8, 6)            # group sizes
N  <- sum(Nj)               # total number of subjects
IV <- factor(rep(1:P, Nj))  # grouping factor
alpha <- 0.05               # alpha-level

# there are N! permutations of ranks within the total sample, but we only want 5000
nPerms <- min(factorial(N), 5000)

# random sample of all N! permutations
# sample(1:factorial(N), nPerms) doesn't work for N! >= .Machine$integer.max
permIdx <- unique(round(runif(nPerms) * (factorial(N)-1)))
nPerms  <- length(permIdx)
H       <- numeric(nPerms)  # vector to later contain the test statistics

# function to calculate test statistic from a given rank permutation
getH <- function(ranks) {
    Rj <- tapply(ranks, IV, sum)
    (12 / (N*(N+1))) * sum((1/Nj) * (Rj-(Nj*(N+1) / 2))^2)
}

# all test statistics for the random sample of rank permutations (breaks for larger N)
# numperm() internally orders all N! permutations and returns the one with a desired index
library(sna)                # for numperm()
for(i in seq(along=permIdx)) { H[i] <- getH(numperm(N, permIdx[i]-1)) }

# cumulative relative frequencies of test statistic from random permutations
pKWH   <- cumsum(table(round(H, 4)) / nPerms)
qPerm  <- quantile(H, probs=1-alpha)  # critical value for level alpha from permutations
qAsymp <- qchisq(1-alpha, P-1)        # critical value for level alpha from chi^2

# illustration of cumRelFreq vs. chi^2 distribution function and resp. critical values
plot(names(pKWH), pKWH, main="Kruskal-Wallis: permutation vs. asymptotic",
     type="n", xlab="h", ylab="P(H <= h)", cex.lab=1.4)
points(names(pKWH), pKWH, pch=16, col="red")
curve(pchisq(x, P-1), lwd=2, n=200, add=TRUE)
abline(h=0.95, col="blue")                         # level alpha
abline(v=c(qPerm, qAsymp), col=c("red", "black"))  # critical values
legend(x="bottomright", legend=c("permutation", "asymptotic"),
       pch=c(16, NA), col=c("red", "black"), lty=c(NA, 1), lwd=c(NA, 2))

enter image description here

Now for an actual MC-permutation test. This compares the asymptotic $\chi^{2}$-derived p-value with the result from coin's oneway_test() and the cumulative relative frequency distribution from the MC-permutation sample above.

> DV1 <- round(rnorm(Nj[1], 100, 15), 2)  # data group 1
> DV2 <- round(rnorm(Nj[2], 110, 15), 2)  # data group 2
> DV3 <- round(rnorm(Nj[3], 120, 15), 2)  # data group 3
> DV  <- c(DV1, DV2, DV3)                 # all data
> kruskal.test(DV ~ IV)                   # asymptotic p-value
Kruskal-Wallis rank sum test
data:  DV by IV 
Kruskal-Wallis chi-squared = 7.6506, df = 2, p-value = 0.02181

> library(coin)                           # for oneway_test()
> oneway_test(DV ~ IV, distribution=approximate(B=9999))
Approximative K-Sample Permutation Test
data:  DV by IV (1, 2, 3) 
maxT = 2.5463, p-value = 0.0191

> Hobs <- getH(rank(DV))                  # observed test statistic

# proportion of test statistics at least as extreme as observed one (+1)
> (pPerm <- (sum(H >= Hobs) + 1) / (length(H) + 1))
[1] 0.0139972

Solved – ANOVA / Kruskal-Wallis: one of four groups has different distribution

As ttnphns commented, neither Kruskal-Wallis nor rank sum tests have any assumptions about distributional similarity between groups. There is a point of confusion that somtimes arises in these tests because, while in the most general sense they are tests for stochastic dominance (e.g., H$_{0} \text{: P}(X_{A} > X_{B}) = \frac{1}{2})$, with two additional assumptions—(1) that the distributions are the same shape, and (2) that any differences between the distributions of the groups are differences of central location—the tests can be interpreted as tests for median difference (e.g., H$_{0} \text{: } \tilde{x}_{A} = \tilde{x}_{b}$).

Therefore, significance is not an issue, and there is nothing to "mitigate." However, substantive interpretation (i.e. stochastic dominance versus median, mean, etc. difference) will entail.

Best Answer

Related Solutions

Kruskal-Wallis Test – Data Considerations for Kruskal-Wallis Test

Solved – ANOVA / Kruskal-Wallis: one of four groups has different distribution

Related Question