Solved – Anova and Tukey HSD vs Linear Regression

anovaleast squaresmultiple-comparisonsregressiontukey-hsd-test

When analyzing the difference in means between $k$ groups, a one way ANOVA is equivalent to a simple linear regression with indicator variables for group membership

$$
Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + … + \beta_{k-1}X_{k-1} + \epsilon
$$

It's my understanding that if specific contrasts aren't pre-specified, researchers should first examine the ANOVA p-value and if the p-value meets their $\alpha$ threshold, then perform Tukey's HSD test for all pairwise comparisons. We first check the omnibus ANOVA test and then use Tukey's multiple comparisons test to maintain an error rate of $\alpha$.

This procedure seems similar to first checking the linear regressions F-statistic and p-value (these should be the same as the ANOVA) and then examining the p-values for individual regression coefficients ($\beta_1$ etc).

With more than two groups, multiple comparisons can become an issue as multiple tests are performed. The Wikipedia page for Tukey's HSD test says Tukey's test accounts for multiple comparisons. Is there any adjustment for multiple comparisons built into the p-values for individual regression coefficients? If not, is it prudent to apply an adjustment (Bonferronni, FDR etc) to the coefficient p-values? I usually don't see discussion of multiple comparisons adjustment for regression coefficients, so I wasn't sure if this is already accounted for or if many just don't consider it a needed adjustment.

Best Answer

Is there any adjustment for multiple comparisons built into the p-values for individual regression coefficients? If not, is it prudent to apply an adjustment (Bonferronni, FDR etc) to the coefficient p-values?

In short, yes, there can be. I usually use linear regression for everything, including designs that could be estimated with an ANOVA. I use R, so I estimate models using the lm function, and then I estimate specific pairwise comparisons using the emmeans package. There is a lot of discussion about adjusting p-values for multiple comparisons in this package. See the vignette section here: https://cran.r-project.org/web/packages/emmeans/vignettes/confidence-intervals.html#adjust

I usually don't see discussion of multiple comparisons adjustment for regression coefficients

This lack of discussion is probably due to your field using ANOVA for these problems more than regression models. It could also be an artifact of what software your field uses, too.

I wasn't sure if this is already accounted for or if many just don't consider it a needed adjustment.

It is not already accounted for in the estimation of the model. Whether or not one considers an adjustment needed is a theoretical discussion. But, since you note that the two are equivalent, if you/your field thinks multiple comparisons should be adjusted for in ANOVA, then that applies to linear regression, as well.