Kruskal-Wallis Test – How to Interpret Post Hoc Test Results Effectively

anovahypothesis testingkruskal-wallis test”post-hocr

I have performed a Kruskal-Wallis test to assess the difference in disease severity (recorded on a scale of 0-10) among different months. Here's the code I used:

kruskal.test(mean_severity~month, data= dat)

Kruskal-Wallis rank sum test

data:  mean_severity by month
Kruskal-Wallis chi-squared = 20.172, df = 7, p-value = 0.00521

The obtained p-value indicates a significant difference in disease severity among the months. To further analyze and visualize the differences, I used the ggstatsplot package with the following code:

ggbetweenstats(data = dat, y=mean_severity , x=month, 
    type="nonparametric", p.adjust.method = "fdr")

My question is: How can I interpret and report the results from the analysis/plot?

Best Answer

Let's create a reproducible example. I simulate a dataset that has the same structure as yours.

set.seed(123)

# Create vector for number of cases per month
cases_per_month <- c(10, 25, 20, 20, 25, 20, 19, 5)

# Create vector for months (April to November)
months <- c("April", "May", "June", "July", "August", "September", "October", "November")

# Create empty vectors for final dataset
dataset <- data.frame(mean_severity = numeric(), month = character())

# Generate dataset
dat <- list()

for (i in 1:length(months)) {
  month <- rep(months[i], cases_per_month[i])
  severity <- sample.int(n = 10, size = cases_per_month[i], replace = TRUE)
  
  # generate some differences in the sample
  if (i %in% c(1, 4, 7)){
    severity <- severity^2
  }
  
  temp_data <- data.frame(mean_severity = severity, month = month)
  dat[[i]] <- rbind(dataset, temp_data)
}

# Using rbind to combine rows
dat <- do.call(rbind, dat) 


# View the resulting dataset
head(dat)

We can run the Kruskal-Wallis test

kruskal.test(mean_severity ~ month, data = dat)

Kruskal-Wallis rank sum test

data:  mean_severity by month
Kruskal-Wallis chi-squared = 40.506, df = 7, p-value = 1.007e-06

And use ggbetweenstats to plot the results of a post hoc multiple comparison test:

require(ggstatsplot)
ggbetweenstats(data = dat, y = mean_severity, x = month, type = "nonparametric")

At the top of the plot, we can see the p-values of Dunn's test for the groups that are statistically different. For some reasons, these are not visualized in the image in question. A line connects every two plots that are statistically different. In this case, for example, the different groups are April and August, April and June, but not April and July.

enter image description here

This is simply a visualization of Dunn's test. To run this test, the package uses the function kwAllPairsDunnTest from the package PMCMRplus with a "Holm" correction for multiple comparisons by default. The table reports the p-values that are represented in the figure. April and August, for example, are different (p-value = 1.3e-05), but April and July are not (p-value = 1.0000).

require(PMCMRplus)
kwAllPairsDunnTest(x = dat$mean_severity, g = as.factor(dat$month), p.adjust.method = "holm")

          April   August  July   June   May    November October
August    1.3e-05 -       -      -      -      -        -      
July      1.0000  8.9e-05 -      -      -      -        -      
June      0.0307  0.5488  0.3327 -      -      -        -      
May       0.0417  0.2328  0.4570 1.0000 -      -        -      
November  1.0000  0.5182  1.0000 1.0000 1.0000 -        -      
October   0.7757  0.0054  1.0000 1.0000 1.0000 1.0000   -      
September 0.0049  1.0000  0.0552 1.0000 1.0000 1.0000   0.5488 

Roughly speaking, when the boxes in the box plot do not overlap, you can hypothesize a possible statistically significant difference between the groups. However, this is approximate, and a statistical test is required to check differences reliably and rigorously. Moreover, if you have skewed data, like in the case of the figure in question where most of the boxes are squeezed at the bottom of the figure, it can be hard to see overlaps with the naked eye. This is why it is useful to check the top part of the figure provided by ggstatsplot, and/or inspect the output table of Dunn's test.