ANOVA – Does ANOVA Rely on the Method of Moments or Maximum Likelihood?

anovamaximum likelihoodmethod of momentsmixed model

I see mentioned in various places that ANOVA does its estimation using the method of moments.

I am confused by that assertion because, even though I am not familiar with the method of moments, my understanding is that it is something different from and not equivalent to the method of maximum likelihood; on the other hand, ANOVA can be seen as a linear regression with categorical predictors and OLS estimation of regression parameters is maximum likelihood.

So:

  1. What qualifies ANOVA procedures as the method of moments?

  2. Given that ANOVA is equivalent to OLS with categorical predictors, isn't it maximum likelihood?

  3. If these two methods somehow turn out to be equivalent in the special case of usual ANOVA, are there some specific ANOVA situations when the difference becomes important? Unbalanced design? Repeated measures? Mixed (between-subjects + within-subjects) design?

Best Answer

I first encountered the ANOVA when I was a Master's student at Oxford in 1978. Modern approaches, by teaching continuous and categorical variables together in the multiple regression model, make it difficult for younger statisticians to understand what is going on. So it can be helpful to go back to simpler times.

In its original form, the ANOVA is an exercise in arithmetic whereby you break up the total sum of squares into pieces associated with treatments, blocks, interactions, whatever. In a balanced setting, sums of squares with an intuitive meaning (like SSB and SST) add up to the adjusted total sum of squares. All of this works thanks to Cochran's Theorem. Using Cochran, you can work out the expected values of these terms under the usual null hypotheses, and the F statistics flow from there.

As a bonus, once you start thinking about Cochran and sums of squares, it makes sense to go on slicing and dicing your treatment sums of squares using orthogonal contrasts. Every entry in the ANOVA table should have an interpretation of interest to the statistician and yield a testable hypothesis.

I recently wrote an answer where the difference between MOM and ML methods arose. The question turned on estimating random effects models. At this point, the traditional ANOVA approach totally parts company with maximum likelihood estimation, and the estimates of the effects are no longer the same. When the design is unbalanced, you don't get the same F statistics either.

Back in the day, when statisticians wanted to compute random effects from split-plot or repeated measures designs, the random effects variance was computed from the mean squares of the ANOVA table. So if you have a plot with variance $\sigma^2_p$ and the residual variance is $\sigma^2$, you might have that the expected value of the mean square ("expected mean square", EMS) for plots is $\sigma^2 + n\sigma_p^2$, with $n$ the number of splits in the plot. You set the mean square equal to its expectation and solve for $\hat{\sigma_b^2}$. The ANOVA yields a method of moments estimator for the random effect variance. Now, we tend to solve such problems with mixed effects models and the variance components are obtained through maximum likelihood estimation or REML.

The ANOVA as such is not a method of moments procedure. It turns on splitting the sum of squares (or more generally, a quadratic form of the response) into components that yield meaningful hypotheses. It depends strongly on normality since we want the sums of squares to have chi-squared distributions for the F tests to work.

The maximum likelihood framework is more general and applies to situations like generalized linear models where sums of squares do not apply. Some software (like R) invite confusion by specifying anova methods to likelihood ratio tests with asymptotic chi-squared distributions. One can justify use of the term "anova", but strictly speaking, the theory behind it is different.