ANOVA Comparisons – Can All Possible Pairs of Comparison Be Planned Comparisons in ANOVA?

anovacontrastsmultiple-comparisonstype-i-and-ii-errors

Let's say we have three intervention groups (condition A, condition B, and condition C/control) to be analyzed in ANOVA, and we are theoretically interested in the difference between each pair of all the possible comparisons (i.e., A and B, B and C, and A and C). Since the three comparisons are according to the hypotheses, I thought that it would be ok to describe this as planned comparisons. However, the planned comparisons do look like post-hoc tests (that test all possible comparisons for exploratory purposes). Is it still acceptable to consider it as planned comparisons without the necessity to adjust the p values? Or should I regard it as post-hoc tests and adjust the p values for multiple testing?

Summary of the suggestions:

I am starting to understand that what I described is not really how planned comparisons should be done. When I run planned contrasts, I may start from comparing intervention groups (A & B) vs. control group (C), followed by the comparison of the two intervention groups (A vs. B). Alternatively, I can focus on a specific pair, although the maximum number of contrasts should be k-1, as BruceET has suggested. (15/Sep/2021)
Thank you so much for your brilliant insights, Tanner! It is important to adjust significance level to avoid false findings when running multiple tests. Together with COOLSerdash's critical suggestions, I may not adjust the significance level when I have a specific hypothesis for each comparison (17/Sep/2021).

Best Answer

It's not only possible, it's explicitly recommended (see Ruxton & Beauchamp 2008).

First, if the omnibus-test of homogeneity across all groups is not of interest (e.g. the overall ANOVA $F$-test), consider not paying attention to it at all or not doing it in the first place. Second, if each planned comparison tests a different specific hypothesis, it is actually controversial if a formal control of the experimentwise type 1 error rate (EER) is required or not. Some text do not consider it necessary (Kirk 1995, Quinn & Keough 2002, Rothman 1990, Rubin 2021, Sokal & Rohlf 1995). Ruxton & Beauchamp (2008) recommend not controlling EER if the set of pre-planned contrasts is orthogonal. If all possible pairwise comparisons between groups are planned, the set of contrasts is not orthogonal and so, Ruxton & Beauchamp (2008) recommend controlling EER.

Rubin (2021) on the other hand argues that in the case of individual testing, no alpha adjustment should be done. He defines individual testing as tests, where each individual result must be significant in order to reject each associated individual null hypothesis. This seems to be the case here if you want to test each pairwise group difference individually. Personally, I find his arguments convincing and have subsequently changed my own opinion on the matter.

References

Kirk RE. 1995. Experimental design. Pacific Grove (CA): Brooks/Cole.

Quinn GP, Keough MJ. 2002. Experimental design and data analysis for biologists. Cambridge (UK): Cambridge University Press.

Rothman, K. J. (1990). No adjustments are needed for multiple comparisons. Epidemiology, 43-46. (link)

Rubin, M. (2021). When to adjust alpha during multiple testing: A consideration of disjunction, conjunction, and individual testing. Synthese, 1-32. (link)

Ruxton, G. D., & Beauchamp, G. (2008). Time for some a priori thinking about post hoc testing. Behavioral ecology, 19(3), 690-693. (link)

Sokal RR, Rohlf FJ. 1995. Biometry. 3rd ed. New York: WH Freeman.

Related Solutions

R Regression – R Planned Comparisons in Zelig Negative Binomial Regression

To answer your specific question 1, yes you can do planned comparisons with multcomp even though you are using a generalized linear model. From the package description:

Simultaneous tests and confidence intervals for general linear hypotheses in parametric models, including linear, generalized linear, linear mixed effects, and survival models.

You can easily implement this with the Zelig output (which is an object from the negbin class since Zelig calls the glm.nb function from the MASS package). Here is an example:

library(Zelig)
library(multcomp)
data(sanction)
z.out <- zelig(num ~ target  * coop, model = "negbin", data = sanction)

## construct contrast matrices
hypo.mat <- rbind("coop0:target1 - target0" = c(0, 1, 0, 0),
                  "coop1:target1 - target0" = c(0, 1, 0, 1))
summary(glht(z.out, hypo.mat))

Which gives the following output:

    Simultaneous Tests for General Linear Hypotheses

Fit: zelig(formula = num ~ target * coop, model = "negbin", data = sanction)

Linear Hypotheses:
                             Estimate Std. Error z value Pr(>|z|)
coop0:target1 - target0 == 0  0.04201    0.38908   0.108    0.971
coop1:target1 - target0 == 0  0.09089    0.24811   0.366    0.786
(Adjusted p values reported -- single-step method)

Note that I used different contrasts than you gave. You are putting the contrasts in terms of the vector of groups, but multicomp (and its general form of hypothesis testing) wants contrasts on the model parameters. We can write the model above as

$\log \mu_i = \beta_0 + \beta_1 x_i + \beta_2 z_i + \beta_3 (x_i \times z_i)$

where $E(Y) = \mu_i$ is the expected value of the outcome. Thus, in this model, the hypothesis that the effect of $x_i$ is zero when $z_i$ is 0 is just:

$H_0: \beta_1 = 0$

This leads to the contrast c(0,1,0,0). The hypothesis that the effect of $x_i$ is zero when $z_i$ is 0 is just:

$H_0: \beta_1 + \beta_3 = 0$

This leads to the contrast c(0,1,0,1).

ANOVA – Why Use ANOVA Instead of Post-Hoc or Planned Comparisons Tests?

Indeed an omnibus test is not strictly needed in that particular scenario and multiple inference procedures like Bonferroni or Bonferroni-Holm are not limited to an ANOVA/mean comparison settings. They are often presented as post-hoc tests in textbooks or associated with ANOVA in statistical software but if you look up papers on the topic (e.g. Holm, 1979), you will find out that they were originally discussed in a much broader context and you certainly can “skip the ANOVA” if you wish.

One reason people still run ANOVAs is that pairwise comparisons with something like a Bonferroni adjustment have lower power (sometimes much lower). Tukey HSD and the omnibus test can have higher power and even if the pairwise comparisons do not reveal anything, the ANOVA F-test is already a result. If you work with small and haphazardly defined samples and are just looking for some publishable p-value, as many people are, this makes it attractive even if you always intended to do pairwise comparisons as well.

Also, if you really care about any possible difference (as opposed to specific pairwise comparisons or knowing which means differ), then the ANOVA omnibus test is really the test you want. Similarly, multi-way ANOVA procedures conveniently provide tests of main effects and interactions that can be more directly interesting than a bunch of pairwise comparisons (planned contrasts can address the same kind of questions but are more complicated to set up). In psychology for example, omnibus tests are often thought of as the main results of an experiment, with multiple comparisons only regarded as adjuncts.

Finally, many people are happy with this routine (ANOVA followed by post-hoc tests) and simply don't know that the Bonferroni inequalities are very general results that have nothing to do with ANOVA, that you can also run more focused planned comparisons or do a whole lot of things beside performing tests. It's certainly not easy to realize this if you are working from some of the most popular “cookbooks” in applied disciplines and that explains many common practices (even if it does not quite justify them).

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6 (2), 65–70.

Best Answer

Related Solutions

R Regression – R Planned Comparisons in Zelig Negative Binomial Regression

ANOVA – Why Use ANOVA Instead of Post-Hoc or Planned Comparisons Tests?

Related Question