Solved – Relation between omnibus test and multiple comparison

hypothesis testingmultiple-comparisons

Wikipedia says

Methods which rely on an omnibus test before proceeding to
multiple comparisons. Typically these methods require a significant ANOVA/Tukey's range test before proceeding to multiple
comparisons. These methods have "weak" control of Type I error.

Also

The F-test in ANOVA is an example of an omnibus test, which tests the
overall significance of the model. Significant F test means that among
the tested means, at least two of the means are significantly
different, but this result doesn't specify exactly what means are
different one from the other. Actually, testing means' differences has
been made by the quadratic rational F statistic ( F=MSB/MSW). In order
to determine which mean differ from another mean or which contrast of
means are significantly different, Post Hoc tests (Multiple Comparison
tests) or planned tests should be conducted after obtaining a
significant omnibus F test. It may be consider using the simple
Bonferroni correction or other suitable correction.

So an omnibus test is used to test the overall significance, while multiple comparison is to find which differences are significant.

But if I understand correctly, the main purpose of multiple comparison is to test the overall significance, and it can also find which differences are significant. In other words, multiple comparison can do what an omnibus can do. Then why do we need an omnibus test?

Best Answer

The purpose of of multiple comparisons procedures is not to test the overall significance, but to test individual effects for significance while controlling the experimentwise error rate. It's quite possible for e.g. an omnibus F-test to be significant at a given level while none of the pairwise Tukey tests are—it's discussed here & here.

Consider a very simple example: testing whether two independent normal variates with unit variance both have mean zero, so that

$$H_0: \mu_1=0 \land \mu_2=0$$ $$H_1: \mu_1 \neq 0 \lor \mu_2\neq 0$$

Test #1: reject when $$X_1^2+X_2^2 \geq F^{-1}_{\chi^2_2}(1-\alpha) $$

Test #2: reject when $$|X_1| \lor |X_2|\geq F^{-1}_{\mathcal{N}} \left(1-\frac{1-\sqrt{1-\alpha}}{2}\right)$$

(using the Sidak correction to maintain overall size). Both tests have the same size ($\alpha$) but different rejection regions:

Plot of rejection regions

Test #1 is a typical omnibus test: more powerful than Test #2 when both effects are large but neither is so very large. Test #2 is a typical multiple comparisons test: more powerful than Test #1 when either effect is large & the other small, & also enabling independent testing of the individual components of the global null.

So two valid test procedures that control the experimentwise error rate at $\alpha$ are these:

(1) Perform Test #1 & either (a) don't reject the global null, or (b) reject the global null, then (& only in this case) perform Test #2 & either (i) reject neither component, (ii) reject the first component, (ii) reject the second component, or (iv) reject both components.

(2) Perform only Test #2 & either (a) reject neither component (thus failing to reject the global null), (b) reject the first component (thus also rejecting the global null), (c) reject the second component (thus also rejecting the global null), or (d) reject both components (thus also rejecting the global null).

You can't have your cake & eat it by performing Test #1 & not rejecting the global null, yet still going on to perform Test #2: the Type I error rate is greater than $\alpha$ for this procedure.