I have data for a test on three groups. The measured variable is ratio scaled. The R code is
g1a<-c(7, 3, 40)
g2a<-c(1,1,2)
g3a<-c(0,0,0)
Since the sample is small and normality cannot be guaranteed, I run a Kruskal Wallis test to check for significance:
l<-list(g1a,g2a,g3a)
kruskal.test(l)
The p-value is 0.02336, which is nice.
Now I run a post-hoc test, using the Mann-Whitney U:
wilcox.test(g1a,g2a,paired=FALSE,exact=TRUE)
wilcox.test(g2a,g3a,paired=FALSE,exact=TRUE)
wilcox.test(g1a,g3a,paired=FALSE,exact=TRUE)
All the resulting p-values are above 0.05 (0.07652, 0.0636, 0.05935). This is very strange. Shouldn't one of these tests give a much lower p-value? Especially since I'd have to use some sort of correction to account for the multiple comparisons in the post-hoc test. In other words: how can I interpret this result?
Best Answer
Think of it this way - overall, there's a significant difference, but it's a little hard to say exactly which two are significantly different. Alternatively, consider the chances of having three p-values less than 0.1 (even though they aren't independent of each other) - pretty small, right? So, again overall, we might suspect something significant is in the data, without being able to tell exactly where.
Your small sample sizes don't help; they mean the powers of your tests are very low, and also severely constrain what sort of p-values you can get, as the following example shows:
So far, so good. On to the three Wilcoxon tests:
All three p-values at 0.1, but we can't get more extreme - W = 0 - so evidently we've hit a sample size imposed limit on p-values.