Solved – How to plot the results of a Kruskal-Wallis or a Welch’s ANOVA

anovadata visualizationkruskal-wallis test”nonparametric

Let's assume that I have two datasets for which I have measured three continuous variables. In both datasets, these variables have been measured on three groups of observations and these groups only contains 5 observations each.
I would like to know if, among these two sets of groups, some tend to have higher values for the measured variables. As my sample sizes are small and as these observations appear to be non-normaly distributed, I would like to use a non-parametric method. Consequently, I use a Kruskal-Wallis test on the first dataset, and Welch's ANOVA on the second one (because some data are heteroscedastic).
My question is what is the appropriate way to plot the results of a Kruskal-Wallis test? And what is the correct way to plot the results of a Welch's ANOVA?

Most people seem to use boxplots for such a purpose. Yet, if I understand correctly, the Kruskal-Wallis test does not exactly compare medians (but mean ranks), and boxplots do not give any information on the means of heteroscedastic samples (and are thus ill-suited for Welch's ANOVA results). So what should I use to plot my results?
On someone's blog, I read about plotting the groups' distributions. But since I only have very small sample sizes, I fear that it would not be very clear and/or informative (of course, I realize that with only 5 observations per group, my tests are not very informative themselves).

Best Answer

As you already mentioned the Kruskal-Wallis test is a test of significance based on the ranks. In my opinion however, plotting the ranks isn't really that helpful for the reader in order to understand the underlying response variable. Instead, what I would do is to plot the individual data points (including the median for descriptive purposes) plus the ranks as differently colored points. To make it clear, you could also place the letters indicating significant difference next to those points indicating the ranks. You can also obviously report everything you don't want to plot (e.g. the ranks as separate points, etc.) in a separate table (see example below).

I am not sure which software package you are using but below is an example using R to illustrate what I mentioned above (note: this approach may not look nice if the numerical values of the data points and the ranks are largely different. In that case, I would plot the data points and the significant differences via letters, and report the ranks in a separate table.

### required packages
require(tidyverse)
#> Loading required package: tidyverse
require(agricolae)
#> Loading required package: agricolae
### set seed for reproducibility
set.seed(564)

### subset the PlantGrowth dataset (available in R) to replicate your n=5 scenario
PlantGrowth %>% 
  group_by(group) %>% 
  slice(sample(1:5)) -> d_sub

### run Kruskal test from the agricolae package
k <- kruskal(d_sub$weight, d_sub$group, console = TRUE)
#> 
#> Study: d_sub$weight ~ d_sub$group
#> Kruskal-Wallis test's
#> Ties or no Ties
#> 
#> Critical Value: 3.290877
#> Degrees of freedom: 2
#> Pvalue Chisq  : 0.192928 
#> 
#> d_sub$group,  means of the ranks
#> 
#>      d_sub.weight r
#> ctrl          8.3 5
#> trt1          5.3 5
#> trt2         10.4 5
#> 
#> Post Hoc Analysis
#> 
#> t-Student: 2.178813
#> Alpha    : 0.05
#> Minimum Significant Difference: 5.816519 
#> 
#> Treatments with the same letter are not significantly different.
#> 
#>      d_sub$weight groups
#> trt2         10.4      a
#> ctrl          8.3      a
#> trt1          5.3      a

### create summary table incl. mean rank sums and significant differences letters
(t_comp <- k$means %>% 
    rownames_to_column(var = "group") %>%
    rename(weight = d_sub.weight) %>%
    as_tibble() %>% 
    left_join(as_tibble(k$groups), by = c("rank" = "d_sub$weight")))
#> # A tibble: 3 x 11
#>   group weight  rank   std     r   Min   Max   Q25   Q50   Q75 groups
#>   <chr>  <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> 
#> 1 ctrl    5.11   8.3 0.788     5  4.17  6.11  4.5   5.18  5.58 a     
#> 2 trt1    4.57   5.3 0.851     5  3.59  5.87  4.17  4.41  4.81 a     
#> 3 trt2    5.57  10.4 0.446     5  5.12  6.31  5.37  5.5   5.54 a

### create plot with ranks as blue dots and align the letters next to them
d_sub %>% 
  ggplot(aes(x = group, y = weight)) +
  geom_point(color = "grey50", size = 2) +
  # add ranks as separate points
  geom_point(data = t_comp, aes(x = group, y = rank), col = "blue", size = 3) + 
  # add median as horizontal line
  stat_summary(fun.y = median, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
               width = .75, col = "red") +
  # add letters
  geom_text(data = t_comp, aes(x = group, y = rank, label = groups), size = 6, nudge_x = -0.1)

Created on 2020-01-30 by the reprex package (v0.3.0)