I am trying to interpret the results of some experiments I have run, involving showing participants videos of animated synthesised speech.
I ran a mean opinion score test. I made 36 movies in total. 9 each with the following exaggeration factors: 1(A), 1.1(B), 1.2(C), 1.3(D).
11 participants were shown the movies in a randomised order, and asked to score each movie between 1 and 5, based on their opinion of the quality of animation.
The mean opinion scores were: A – 2.41, B – 3.06, C – 3.3, D – 2.99, making C (1.2) the favourite.
A Kruskal-Wallis test (null hypothesis – the samples come from the same distribution) puts the $p$-value at 1.8595e-08, making it extremely unlikely that they come from the same distribution. Therefore we reject the null hypothesis and the test is significant at 0.01 level.
So then I ran Dunn's test to try and ascertain which of the treatments is significantly preferred. Using the Matlab function Dunn, I get the following results, but I don't know what they are telling me.
STEPDOWN DUNN TEST FOR NON PARAMETRIC MULTIPLE COMPARISONS
Group N Sum of ranks Mean rank
1 99 13893.00 140.33
2 99 20879.50 210.90
3 99 23569.50 238.08
4 99 20264.00 204.69
Ties factor: 6882
Test Q-value Critical Q Comment
3 vs 1 6.0084 2.6310 Reject Ho
3 vs 4 2.0525 2.6310 Fail to reject Ho
3 vs 2 No comparison made Accept Ho
2 vs 1 4.3381 2.6310 Reject Ho
2 vs 4 0.3822 2.6310 Fail to reject Ho
4 vs 1 3.9559 2.6310 Reject Ho
Resuming...
0 1 1 1
0 0 0 0
0 0 0 0
0 0 0 0
Any help greatly appreciated.
Friedman's ANOVA Table
Source SS df MS Chi-sq Prob>Chi-sq
Columns 27.2841 1 27.2841 55.5501 9.1106e-14
Error 167.2159 395 0.42333
Total 194.5 791
Test for column effects after row effects are removed
Best Answer
Three points:
The null hypothesis of the Kruskal-Wallis is not what you have written, but rather stochastic equality, H$_{0}\text{: P}\left(X_{i} > X_{j}\right) = 0.5$ for all $i,j \in \{1,\dots,k\}$ for $k$ groups (assuming the CDFs of any two groups do not cross), so you are testing for stochastic dominance. When more stringent assumptions that each treatment has identically shaped distributions, and differences are entirely in location-shift, then you can interpret the null hypothesis as equality of medians, and the test as a test for median difference; and
Your data do not sound appropriate to the Kruskal-Wallis test, because you have a blocked study design where the same individuals are measured repeatedly. Thus you are looking for a repeated measures test for 'treatment', given your scoring variable, repeated measures ANOVA is perhaps not a good candidate. However, the nonparametric Friedman test may well suit your needs; plus
Tests like Kruskal-Wallis and Friedman assume that the data (your scores) are continuously measured. There are often 'corrections for ties' in nonparametric tests, but you should make sure that your statistical software uses such, and bear in mind that lots of ties (as might happen when there are only five possible scores) may distort your results.