Solved – Friedman’s test is very significant, but its post hoc comparisons (SPSS) are not significant

dunn-testfriedman testmultiple-comparisonspost-hocspss

I ran a non-parametric Friedman's test for my data in SPSS 22 and significantly rejected the null. That would mean that among the $k$ paired samples (3 in my case), there should be detected at least two samples with unequal distributions – one tending to be greater than the other. So, post hoc comparisons are justified.

However, if I further run the SPSS built-in post-Friedman post hoc pairwise multiple comparisons, which, according to this SPSS note, are based on Dunn's (1964) approach with the Bonferroni correction, I get non-significance for all the pairs. The omnibus Friedman significance was very persuasive ($p=0.002$), but the results of pairwise post hoc tests are all not significant, even for figures without the Bonferroni adjustment.

enter image description here
enter image description here

Why is it so? Am I doing it wrong or is SPSS?

What is the proper after-Friedman post hoc pairwise testing?

The sample dataset is available here as SPSS data, or as printed next:

V1  V2  V3
5   5   5
4   4   5
5   3   5
4   5   5
5   5   5
5   5   5
5   5   4
5   5   5
5   5   4
5   5   5
5   5   5
4   4   4
4   4   4
4   5   5
3   3   3
4   4   5
3   5   2
5   5   5
3   3   5
4   4   4
5   5   5
5   4   5
5   5   5
5   5   5
4   4   5
5   5   5
5   5   5
5   4   5
5   5   5
5   5   5
4   4   4
4   4   4
5   5   5
4   4   4
4   5   4
5   5   5
4   4   4
4   4   4
4   5   4
5   5   5
5   5   5
5   5   5
5   4   4
5   5   5
4   5   5
5   5   5
5   5   5
5   5   5
5   5   5
4   4   4
5   5   4
5   5   5
5   5   4
5   4   4
5   5   5
4   4   4
4   4   4
5   4   3
5   5   4
4   5   4
5   5   5
5   5   5
4   4   4
5   5   4
5   4   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   4   5
5   5   5
5   5   5
5   4   5
5   5   5
5   5   5
5   5   4
4   4   4
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   4   5
5   5   5
5   5   5
5   5   5
4   4   3
4   4   4
5   5   4
4   4   5
4   5   4
4   3   4
4   4   4
4   4   4
4   4   4
5   4   4
5   4   4
2   2   3
4   4   5
4   4   4
5   4   5
4   4   3
4   4   4
4   4   5
5   2   5
4   3   5
4   4   4
4   5   4
4   4   4
4   5   5
5   5   5
5   5   5
4   5   4
5   3   5
5   5   5
5   4   5
5   3   5
2   3   5
5   5   5
5   5   5
4   4   4
5   5   4
4   5   5
5   5   5
5   5   5
3   4   4
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   4
5   5   5
5   5   5
5   5   3
5   5   3
5   5   5
5   5   3
5   5   4
5   5   3
5   5   3
5   5   5
5   5   5
5   5   3
5   5   4
5   5   3
5   5   5
5   5   3
5   5   5
5   5   3
5   5   4
5   5   5
5   5   5
4   4   4
4   4   4
3   4   4
4   5   5
3   5   4
3   5   4
5   5   5
3   3   4
5   5   5
5   5   5
5   5   4
4   4   4
4   4   4
4   4   4
5   5   5
3   2   4
3   2   4
4   4   5
5   5   5
3   1   2
5   4   1
5   4   5
5   5   5
5   4   3
4   5   4
2   3   5
3   2   1
3   2   2
5   5   5
4   4   5
5   5   1
5   3   3
3   3   4
5   3   4
4   5   5
5   4   3
5   1   4
4   2   2
4   4   2
5   2   1
4   4   5
5   3   5
5   3   5
2   5   4
4   3   4
5   4   4
5   2   1
5   4   2
3   1   5
4   4   5
5   4   2
3   4   1
5   3   2
5   4   5
4   1   5
5   4   5
4   3   5
5   4   5
4   5   5
5   4   4
5   2   2
4   5   4
4   4   5
5   5   3
4   5   4
5   4   4
5   4   4
5   5   5
4   4   4
5   5   5
5   4   3
5   5   5
5   5   5
5   4   5
5   5   5
5   5   5
5   5   5
5   5   5
4   5   5
5   4   4
5   5   5
4   4   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
2   4   5
4   4   4
5   4   4
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
4   4   4
5   5   5
4   5   4
5   4   5
5   5   4
5   4   4
5   5   5
5   2   3
5   2   2
5   2   1
1   1   1
4   4   3
4   4   4
5   4   4
5   5   4
5   4   5
5   4   3
3   5   5
4   3   4
4   3   4
4   4   5
4   4   3
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
4   4   4
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
4   4   4
5   5   5
5   5   5
5   5   5
5   5   5
4   4   4
4   4   4
5   5   5
5   5   4
4   5   5
5   4   4
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
4   4   5
2   4   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
4   4   4
5   5   5
5   5   5
5   4   4
5   4   4
5   5   5
5   5   5
4   5   4
4   4   4
4   3   4
4   4   3
5   4   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   4   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
4   5   5
5   5   5
4   5   4
5   5   5
1   5   4
5   4   5
5   5   5
5   5   5
4   4   4
4   2   5
5   5   5
3   4   5
5   5   5
4   4   4
5   4   4
5   4   5
5   5   5
4   3   4
4   4   4
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5
5   5   5

Best Answer

SPSS Algorithms state that in doing pairwise comparisons after Friedman test they use the Dunn's (1964) procedure. I didn't read that Dunn's original paper so I can't say if SPSS follows it correctly, - but I've just sat and programmed Friedman's test and its post-hoc pairwise comparisons following the above SPSS algorithms documentation, and I confirm that there is no bug and that my results were identical to what SPSS output and the OP showed in the question. (See my code here).

According to the Dunn's approach (as SPSS carries it out) the test statistic is simply the difference in the mean values of the two samples (variables) being compared, that difference after the values were turned into ranks within cases. (It is the ranks left from Friedman's test computations, that is, ranking of the $k$ [k=3 in our example data] values within each case, with mean rank assignment for ties.) St. error of the statistic is $\sqrt{k(k+1)/(6n)}$. It divides the test statistic to yield standardized statistic $Z$ which is plugged in st. normal distribution to give the (Bonferroni yet uncorrected) 2-sided significance.

This comparison test looks very conservative. It failed to praise the pair V1-V2 as significant: Z=1.838, p=.066 despite that the omnibus Friedman is strongly significant: p=.002. In contrast, Sign test for pair V1-V2 (it will be the same irrespective whether you perform it on the raw values or on the ranks left from Friedman) has Z=3.575, p=.0004.

One reason the SPSS "Dunn's approach" is quite conservative is its st. error formula accounting for all the $k$, not 2, variables.

Another reason why it is so less powerful than the Sign test is that it bases itself on all the $n$ cases, including those with ties, while Sign test discards cases with ties; and there is many cases with ties in our data. The problem of power in conjunction with treatment of ties in tests such as Sign was observed, for example in this Q/A.

I took V1 and V2 and, for cases with ties, untied them in a random fashion (by adding negative or positive noise), and computed Sign test (now based on all $n$ cases of course). 500 such trials gave me mean Z=1.927, which is now far from Z=3.575 and much closer on the road of conservatism towards the observed Dunn's Z=1.838.

I feel myself dissatisfied with SPSS' "Dunn's" pairwise comparisons as they are too conservative/weak. We expect that if an omnibus test is significant post hoc tests will confirm it often, if not ever. In our example, even Bonferroni-uncorrected p-value could not support the omnibus conclusion.

Is SPSS at all correct in adopting the "Dunn's approach" (originally proposed for Kruskal-Wallis; see also this Q/A) for Friedman post-hoc testing? I can't say, being hardly an expert in multiple comparisons. I would encourage somebody who knows it to comment or post a really helpful answer on this thread.


P.S. I'm quite aware that, while Friedman test can be seen as an extension of Sign test from 2 to $k$ samples (variables), a pairwise post hoc test after Friedman is not and should not be exactly the Sign test. Neither it would be Wilcoxon paired-samle test. The "Dunn's approach" (if adapted to paired-sample situation) looks plausible post hoc because it compares, without further ranking, the "horizontal" ranks obtained at Friedman and reflecting all the $k$ samples. What bothered me, though, was that the approach appeared overconservative in the example of the post.


Later Addition. To me, Dunn's approach as it is implemented after Friedman's test in SPSS is incorrect. It does not adjust for ties in the same fashion as the parent omnibus test (Friedman) does it. Actually, it does not adjust for the ties at all, while it should. (The issue of ties handling is touched in the current answer above.)

The formula of Friedman's test statistic (explained in SPSS Algorithms) is $$\chi^2= \frac{[12/(nk(k+1))]\sum^k C^2-3n(k+1)}{1-\Sigma T/[nk(k^2-1)]}$$

The denominator of the formula contains the adjustment for ties. If $k=2$ then quantity $\Sigma T/[nk(k^2-1)]$ is the proportion of cases in which the two variables are equal (tied).

Consider Friedman test performed with our variables V1 and V2 ($k=2$). The proportion of cases with ties is 287/400=.7175 and the test statistic is 13.460, df=1 with significance p=.00024. But the "Dunn's" comparison computed following SPSS formulas will be

Sample1  Sample2  MeanRank1 MeanRank2 TestStat  StError   Z    Sig2side  AdjSig
  V1       V2      1.54875   1.45125   .0975     .0500  1.9500  .05118  .05118

Nonsignificant. Why? No proper (Friedman style) adjustment for ties was done.

In the presense of only $k=2$ samples in data a correct post hoc pairwise comparison test must give the same result (statistic and p-value) as the omnibus test - it is actually a property which proves that the post hoc test corresponds (is isomorphic) to the parent omnibus test. It is indeed so with Kruskal-Wallis test and Dunn's test - just program it following SPSS Algorithms and test with V1 and V2 as two independent groups, and you'll get same p=.0153 both for KW and for Dunn. But we saw that a similar equivalence is absent in relations between Friedman test and "Dunn's approach" post-Friedman comparison test.

Conclusion. Post hoc multiple comparison test being performed by SPSS (version 22 and earlier) after Friedman's test is defective. Maybe it is correct when there is no ties, but I don't know. The post hoc test does not treat ties the way Friedman does it (while it should). I cannot say anything about the formula of st. error, sqrt[k*(k+1)/(6n)], they are using: it was derived from discrete uniform distribution, but they didn't write how; is it correct? Either the "Dunn's test approach" was adapted to Friedman inadequatly by SPSS or Dunn's test cannot be adapted to Friedman at all.