Is it possible for one-way (with $N>2$ groups, or "levels") ANOVA to report a significant difference when none of the $N(N-1)/2$ pairwise t-tests does?
In this answer @whuber wrote:
It is well known that a global ANOVA F test can detect a difference of means even in cases where no individual [unadjusted pairwise] t-test of any of the pairs of means will yield a significant result.
so apparently it is possible, but I do not understand how. When does it happen and what the intuition behind such a case would be? Maybe somebody can provide a simple toy example of such a situation?
Some further remarks:
-
The opposite is clearly possible: overall ANOVA can be non-significant while some of the pairwise t-tests erroneously report significant differences (i.e. those would be false positives).
-
My question is about standard, non-adjusted for multiple comparisons t-tests. If adjusted tests are used (like e.g. Tukey's HSD procedure), then it is possible that none of them turns out to be significant even though the overall ANOVA is. This is covered here in several questions, e.g. How can I get a significant overall ANOVA but no significant pairwise differences with Tukey's procedure? and Significant ANOVA interaction but non-significant pairwise comparisons.
-
Update. My question originally referred to the usual two-sample pairwise t-tests. However, as @whuber pointed out in the comments, in the ANOVA context, t-tests are usually understood as post hoc contrasts using the ANOVA estimate of the within-group variance, pooled across all groups (which is not what happens in a two-sample t-test). So there are actually two different versions of my question, and the answer to both of them turns out to be positive. See below.
Best Answer
Note: There was something wrong with my original example. I stupidly got caught by R's silent argument recycling. My new example is quite similar to my old one. Hopefully everything is right now.
Here's an example I made that has the ANOVA significant at the 5% level but none of the 6 pairwise comparisons are significant, even at the 5% level.
Here's the data:
Here's the ANOVA:
Here's the two sample t-test p-values (equal variance assumption):
With a little more fiddling with group means or individual points, the difference in significance could be made more striking (in that I could make the first p-value smaller and the lowest of the set of six p-values for the t-test higher).
--
Edit: Here's an additional example that was originally generated with noise about a trend, which shows how much better you can do if you move points around a little:
The F has a p-value below 3% and none of the t's has a p-value below 8%. (For a 3 group example - but with a somewhat larger p-value on the F - omit the second group)
And here's a really simple, if more artificial, example with 3 groups:
(In this case, the largest variance is on the middle group - but because of the larger sample size there, the standard error of the group mean is still smaller)
Multiple comparisons t-tests
whuber suggested I consider the multiple comparisons case. It proves to be quite interesting.
The case for multiple comparisons (all conducted at the original significance level - i.e. without adjusting alpha for multiple comparisons) is somewhat more difficult to achieve, as playing around with larger and smaller variances or more and fewer d.f. in the different groups don't help in the same way as they do with ordinary two-sample t-tests.
However, we do still have the tools of manipulating the number of groups and the significance level; if we choose more groups and smaller significance levels, it again becomes relatively straightforward to identify cases. Here's one:
Take eight groups with $n_i=2$. Define the values in the first four groups to be (2,2.5) and in the last four groups to be (3.5,4), and take $\alpha=0.0025$ (say). Then we have a significant F:
Yet the smallest p-value on the pairwise comparisons is not significant that that level: