Maximum Likelihood – Why Restricted Maximum Likelihood Yields a Better Unbiased Variance Estimate

hierarchical-bayesianmaximum likelihoodmixed modelremlunbiased-estimator

I'm reading Doug Bates' theory paper on R's lme4 package to better understand the nitty-gritty of mixed models, and came across an intriguing result that I'd like to understand better, about using restricted maximum likelihood (REML) to estimate variance.

In section 3.3 on the REML criterion, he states that the use of REML in variance estimation is closely related to the use of a degrees of freedom correction when estimating variance from residual deviations in a fitted linear model. In particular, "although not usually derived this way", the degrees of freedom correction can be derived by estimating the variance through optimization of a "REML criterion" (Eq. (28)). The REML criterion is essentially just the likelihood, but the linear fit parameters have been eliminated by marginalizing (instead of setting them equal to the fit estimate, which would give the biased sample variance).

I did the math and verified the claimed result for a simple linear model with only fixed effects. What I'm struggling with is the interpretation. Is there some perspective from which it is natural to derive a variance estimate by optimizing a likelihood where the fit parameters have been marginalized out? It feels sort of Bayesian, as though I'm thinking of the likelihood as a posterior and marginalizing out the fit parameters as though they are random variables.

Or is the justification primarily just mathematical – it works in the linear case but is also generalizable?

Best Answer

The bias in the variance stemms from the fact that the mean has been estimated from the data and therefore the 'spread of that data around this estimated mean' (i.e. tha variance) is smaller than the spread of the data around the 'true' mean. See also : Intuitive explanation for dividing by $n-1$ when calculating standard deviation?

The fixed effets determine the model 'for the mean', therefore, if you can find a variance estimate that was derived without estimating the mean from the data (by 'marginalising out the fixed effects (i.e. the mean)') then this underestimation of the spread (i.e. variance) will be mitigated.

This is the 'intuitive' understanding why REML estimates eliminate the bias; you find an estimate for the variance without using the 'estimated mean'.