ANOVA vs Linear Regression – Understanding Differences in Research Methodology

anovaregression

ANOVA is equivalent to linear regression with the use of suitable dummy variables. The conclusions remain the same irrespective of whether you use ANOVA or linear regression.

In light of their equivalence, is there any reason why ANOVA is used instead of linear regression?

Note: I am particularly interested in hearing about technical reasons for the use of ANOVA instead of linear regression.

Edit

Here is one example using one-way ANOVA. Suppose, you want to know if the average height of male and females is the same. To test for your hypothesis you would collect data from a random sample of male and females (say 30 each) and perform the ANOVA analysis (i.e., sum of squares for sex and error) to decide whether an effect exists.

You could also use linear regression to test for this as follows:

Define: $\text{Sex} = 1$ if respondent is a male and $0$ otherwise.
$$
\text{Height} = \text{Intercept} + \beta * \text{Sex} + \text{error}
$$

where: $\text{error}\sim\mathcal N(0,\sigma^2)$

Then a test of whether $\beta = 0$ is a an equivalent test for your hypothesis.

Best Answer

As an economist, the analysis of variance (ANOVA) is taught and usually understood in relation to linear regression (e.g. in Arthur Goldberger's A Course in Econometrics). Economists/Econometricians typically view ANOVA as uninteresting and prefer to move straight to regression models. From the perspective of linear (or even generalised linear) models, ANOVA assigns coefficients into batches, with each batch corresponding to a "source of variation" in ANOVA terminology.

Generally you can replicate the inferences you would obtain from ANOVA using regression but not always OLS regression. Multilevel models are needed for analysing hierarchical data structures such as "split-plot designs," where between-group effects are compared to group-level errors, and within-group effects are compared to data-level errors. Gelman's paper [1] goes into great detail about this problem and effectively argues that ANOVA is an important statistical tool that should still be taught for it's own sake.

In particular Gelman argues that ANOVA is a way of understanding and structuring multilevel models. Therefore ANOVA is not an alternative to regression but as a tool for summarizing complex high-dimensional inferences and for exploratory data analysis.

Gelman is a well-respected statistician and some credence should be given to his view. However, almost all of the empirical work that I do would be equally well served by linear regression and so I firmly fall into the camp of viewing it as a little bit pointless. Some disciplines with complex study designs (e.g. psychology) may find ANOVA useful.

[1] Gelman, A. (2005). Analysis of variance: why it is more important than ever (with discussion). Annals of Statistics 33, 1–53. doi:10.1214/009053604000001048

Related Question