Solved – How to obtain REML estimates in a glmer? Optimizing random effects structure in glmer function (lme4 R package)

lme4-nlmemaximum likelihoodreml

I'm trying to fit a model with the function glmer (lmer4 1.1-7 package) in R using REML. I tried to use the argument method=REML to do it, but this argument is deprecated.

It seems that the way to produce greater accuracy in the evaluation of the log-likelihood is by mean of the adaptive Gauss-Hermite quadrature (i.e. argument "nAGQ>1"). However, nAGQ > 1 is only available for models with a single, scalar random-effects term and my model has more than 1 random-effects.

How can I do it for glmer with more than one random-effect terms?

Best Answer

You have several different problems here, which range from computational to statistical. It would help if your question gave more context (do you have reasons to believe that the default Laplace-approximation calculation will be unsuitable? What is your need for REML rather than ML estimates?), but I'll try to answer in general.

REML is not clearly defined for (non-Gaussian) GLMMs, see e.g. the relevant section in the GLMM faq. In principle you can get GREML ("generalized restricted maximum likelihood", as coined by Millar 2011) estimates of variance components by integrating the likelihood over the fixed-effect parameters; this can be done in AD Model Builder, or via Bayesian solutions such as those described above, but as far as I'm aware it's not otherwise available.

Before spending a lot of time worrying about REML, I would try to consider how much difference it's going to make. It is true in the LMM context that REML estimates are unbiased while ML estimates are not, but the magnitude of the ML bias is typically on the order of $n/(n-p)$ (where $n$ is the number of observations and $p$ is the number of parameters). For example, for the sleepstudy data set that comes with lme4 (admittedly a well-behaved data set):

library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
VarCorr(fm1)
## Groups   Name        Std.Dev. Corr 
## Subject  (Intercept) 24.7404       
##          Days         5.9221  0.066 
## Residual             25.5918  

Comparing with the ML estimate, the difference is about 4%, larger than $n/(n-p) (= 180/178 \approx 1.01)$ but not something I would really worry about, especially since the confidence intervals are much wider than that (approx. 14-37 for the intercept and 4-9 for the slope variance).

fm1_ML <- update(fm1,REML=FALSE)     
VarCorr(fm1_ML)
## Groups   Name        Std.Dev. Corr 
## Subject  (Intercept) 23.7806       
##          Days         5.7168  0.081
## Residual             25.5918       

Is $n/(n-p)$ large in your case? Obviously it would be nice to be able to compute the REML or REML-analogue estimates for your own data so you could be sure that the difference wasn't large in your case, but I have to say that this isn't the sort of thing that keeps me up at night ...


since you indicated that AGHQ is a secondary concern, I've moved this part of the answer to the end

  • adaptive Gauss-Hermite quadrature for multiple random effects terms is not available in any R package that I'm aware of. The additional accuracy of AGHQ is really only necessary when there is a small effective sample size per cluster, e.g. you have binary outcomes and a small number of observations per group. If you are concerned about the accuracy of Laplace approximation and have a complex design, I would say your best bet is to compare the results of the Laplace approximation with a Bayesian solution such as those implemented in the MCMCglmm or brms or rstanarm packages.