Kruskal-Wallis Test – Understanding Kruskal-Wallis and Dunn’s Test

dunn-testkruskal-wallis test”

I am trying to interpret the results of some experiments I have run, involving showing participants videos of animated synthesised speech.

I ran a mean opinion score test. I made 36 movies in total. 9 each with the following exaggeration factors: 1(A), 1.1(B), 1.2(C), 1.3(D).

11 participants were shown the movies in a randomised order, and asked to score each movie between 1 and 5, based on their opinion of the quality of animation.

The mean opinion scores were: A – 2.41, B – 3.06, C – 3.3, D – 2.99, making C (1.2) the favourite.

A Kruskal-Wallis test (null hypothesis – the samples come from the same distribution) puts the $p$-value at 1.8595e-08, making it extremely unlikely that they come from the same distribution. Therefore we reject the null hypothesis and the test is significant at 0.01 level.

So then I ran Dunn's test to try and ascertain which of the treatments is significantly preferred. Using the Matlab function Dunn, I get the following results, but I don't know what they are telling me.

STEPDOWN DUNN TEST FOR NON PARAMETRIC MULTIPLE COMPARISONS

Group     N            Sum of ranks         Mean rank
 1        99             13893.00              140.33
 2        99             20879.50              210.90
 3        99             23569.50              238.08
 4        99             20264.00              204.69

Ties factor: 6882

Test        Q-value        Critical Q         Comment
3 vs 1      6.0084         2.6310             Reject Ho
3 vs 4      2.0525         2.6310             Fail to reject Ho
3 vs 2      No comparison made                Accept Ho
2 vs 1      4.3381         2.6310             Reject Ho
2 vs 4      0.3822         2.6310             Fail to reject Ho
4 vs 1      3.9559         2.6310             Reject Ho

Resuming...
     0     1     1     1
     0     0     0     0
     0     0     0     0
     0     0     0     0

Any help greatly appreciated.

Friedman's ANOVA Table
    Source  SS  df  MS  Chi-sq  Prob>Chi-sq
    Columns 27.2841 1   27.2841 55.5501 9.1106e-14
    Error   167.2159    395 0.42333     
    Total   194.5   791         
    Test for column effects after row effects are removed

Best Answer

Three points:

  1. The null hypothesis of the Kruskal-Wallis is not what you have written, but rather stochastic equality, H$_{0}\text{: P}\left(X_{i} > X_{j}\right) = 0.5$ for all $i,j \in \{1,\dots,k\}$ for $k$ groups (assuming the CDFs of any two groups do not cross), so you are testing for stochastic dominance. When more stringent assumptions that each treatment has identically shaped distributions, and differences are entirely in location-shift, then you can interpret the null hypothesis as equality of medians, and the test as a test for median difference; and

  2. Your data do not sound appropriate to the Kruskal-Wallis test, because you have a blocked study design where the same individuals are measured repeatedly. Thus you are looking for a repeated measures test for 'treatment', given your scoring variable, repeated measures ANOVA is perhaps not a good candidate. However, the nonparametric Friedman test may well suit your needs; plus

  3. Tests like Kruskal-Wallis and Friedman assume that the data (your scores) are continuously measured. There are often 'corrections for ties' in nonparametric tests, but you should make sure that your statistical software uses such, and bear in mind that lots of ties (as might happen when there are only five possible scores) may distort your results.

Related Question