Multilevel Analysis – When to Set nAGQ = 0 in glmer

lme4-nlmemultilevel-analysis

According to the documentation for glmer, nAGQ refers to "the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood". I understand the n in the acronym stands for number of points, while the AGQ stands for Adaptive Gauss-Hermite Quadrature.

Douglas Bates further explains that the difference between nAGQ=0 or nAGQ=1 "is whether the fixed-effects parameters, $\beta$, are optimized in PIRLS or in the nonlinear optimizer."

The general advice seems to be that setting nAGQ = 0 is less accurate than setting nAGQ = 1 (the default), and that where possible setting it higher is better still, although there will be a tradeoffs inasmuch as higher values of nAGQ will have longer runtimes and a higher chance of convergence failures.

I have in mind scenarios in which the person running the model intends for the results to be published in a scientific paper, and not scenarios
in which the person doesn’t care much about accuracy as opposed to runtime. Is setting nAGQ = 0 only appropriate when the model will otherwise not converge (or won't converge in sufficient time), and all other means of resolving the convergence issue have failed? What other factors make it more or less acceptable?

Robert Long has suggested elsewhere on this site that

You might get more accurate results with nAGQ>0. The higher the
better. A good way to assess whether you need to is to take some
samples from your dataset, run the models with nAGQ=0 and nAGQ>0 and
compare the results on the smaller datasets. If you find little
difference, then you have a good reason to stick with nAGQ=0 in the
full dataset.

This seems a good rule of thumb, although there remains the question of what to do if the subsampled data has convergence failures when nAGQ is greater than 0.

My question is related to another question, in which I could not find any other way to make a model converge aside from setting nAGQ = 0, and usually had convergence failures on subsampled data.

Best Answer

I would saying nAGQ = 0 is acceptable when it's "good enough", e.g.

you simply can't get the model to perform adequately with nAGQ > 0 (although I would be highly suspicious of this case; it would be better to fix the problem and/or use multiple optimizers (via allFit()) to confirm that the fit with nAGQ ≥ 1 is actually getting an adequate fit).
Using nAGQ >0 is computationally infeasible. (glmmTMB may be faster, especially if there are many top-level (variance/covariance + fixed-effect) parameters, although it can only handle nAGQ=1 (i.e. Laplace approximation); MixedModels.jl is likely to be fast as well. GLMMadaptive does nAGQ>1, but I don't know about is speed.)
You have tested on similar cases where both nAGQ=0 and nAGQ=1 work and found that nAGQ=0 is generally close enough for your purposes (this is a scientific, not a statistical, criterion). There is an example here:

The best description I have found of the profiling assumption is in section 16.2 of the TMB documentation, which states that "one can apply the profile argument to move outer parameters to the inner problem without affecting the model result" when these assumptions hold:

Assumption 1 The partial derivative $\partial_\beta f(u,\beta, \theta)$ is a linear function of $u$.
Assumption 2 The posterior mean is equal to the posterior mode: $E(u|x)=\hat u$

They hold exactly for linear mixed models. It's hard to guess exactly when they are likely to hold, but I would very tentatively speculate that (1) will be OK when the inverse link function is "not too nonlinear", i.e. when the linear predictors are mostly in a range where the second derivative of the inverse link function is small (e.g. for a logit link, values near zero) and (2) will be OK when the "effective sample size" per group is large so that we can wave our hands in the direction of the Central Limit Theorem (e.g. binomial observations with $\textrm{min}(pN, (1-p)N)$ not too small, or Poisson observations with average counts not too small — these are also the cases where penalized quasi-likelihood is likely to be OK, see Breslow 2003).

But, I'm not aware of a systemic exploration of these issues.

Breslow, Norm. “Whither PQL?” UW Biostatistics Working Paper Series, #192, xx xx 2003. http://www.bepress.com/uwbiostat/paper192.

Related Solutions

GLMER – Troubleshooting Model Convergence Issues in R Using lme4 nlme Packages

A couple of notes:

Often in complex designs such as the one you postulated it could happen that the actual data do not have strong enough correlations to support including all random effects the design dictates. This is especially the case for binary outcome data that contain the least amount of information compared to other types of data (e.g., count, ordinal, continuous). Hence, it is often better to start with a simpler model and try adding each time random effects to see if they improve the fit of the model sufficiently.
As further evidence for the comment above, note that the estimated variance for the term LaunchedFromEpochYEAR is practically zero, suggesting that you do not need this random effect term into the model.

Multilevel Analysis – Why Does This 3-Level Multilevel Logistic Regression Fail to Converge?

I'm going to answer this with two pictures and a lot of code.

The underlying difficulty with your model is that you have a very uneven distribution of samples, with very few samples in the early years .

Points (areas proportional to sample size; some invisible because the samples are very small) show proportions by statistic/year/categorisation and exact binomial CIs: blue curves are loess fits to the overall data set (not split by statistic). In a lot of the early years, there are very few observations, all of which are zero; this is going to lead to problems very similar to complete separation. Centering the year helps with this a little bit.

Results of plotting allFit: the differences in the fixed parameter estimates are probably negligible relative to the scale of the confidence intervals. The intercept parameter is of very large magnitude, as is the standard deviation of the articleID:journalID random effect (the CIs are also probably very large, I didn't compute profile CIs ...) The journalID SD is effectively zero (a singular model).

Overall, I would use the results from nloptr BOBYQA (the best negative log-likelihood). I am definitely concerned about the validity/accuracy of the Laplace approximation; about half of the journalID groups have fewer than 5 observations, and all of the journalID:articleID groups do. (A usual rule of thumb for adequacy of the Laplace approximation for binary data is that the minimum of the number of zeros and the number of ones per group should be greater than 5-10 ...). I tried to run GLMMadaptive::mixed_model() (because glmer can't do nAGQ>1 with multiple random effects) but hit an error ... (it might be possible to get around this by defining the journalID:articleID interaction explicitly and dropping empty levels ... ??) (update: I think this is because the grid on which we would need to evaluate the quadrature is impossibly large (1800^n) ...)

library(lme4)
library(GLMMadaptive)
library(ggplot2); theme_set(theme_bw())
library(colorspace)
library(tidyverse)

dat <- read.csv("CV543932.csv")
## small adjustments; in particular, scaling/centering year helps
##  with stability
dat <- transform(dat,
                 error = as.numeric(error),
                 categorisation = factor(categorisation),
                 year = drop(scale(year)))

m2 <- glmer(error ~ 1 + year + categorisation + statistic +
              (1 | journalID/articleID),
            data = dat,
            family = binomial(link = "logit"))

summary(m2)

aa <- allFit(m2, parallel = "multicore", ncpus = 6)

try(
m3 <- mixed_model(error ~ 1 + year + categorisation + statistic,
                  random = ~1 | journalID/articleID,
                  data = dat,
                  family = binomial(link = "logit"))
)
## Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
##   invalid 'times' value

save(aa, m2, file = "CV54932.rda")

This is using the development version of broom.mixed (remotes::install_github("bbolker/broom.mixed")), where I just added tidiers for allFit objects.

tt <- suppressWarnings(
    ## tidy() only has models that succeeded at some level
    left_join(tidy(aa, conf.int = TRUE), glance(aa), by = "optimizer")
    ## keep only 'adequate' models
    %>% filter(NLL_rel < 0.3)
    ## adjustments for plotting
    %>% mutate(across(term, ~ifelse(!is.na(group),
                                    paste(term,group, sep = "."),
                                    term)))
    %>% select(optimizer:estimate, conf.low, conf.high, NLL_rel)
    %>% mutate(across(optimizer, ~fct_reorder(., NLL_rel)))
    %>% mutate(across(term, fct_inorder))
)

ggplot(tt,
       aes(y=optimizer, x=estimate, xmin=conf.low, xmax=conf.high, colour = NLL_rel)) +
  geom_point() +
  geom_linerange() +
  facet_wrap(~term, scale="free")

prop_cl_binom <- function(x, ...)  {
  bb <- binom.test(sum(x), length(x))
  data.frame(y = mean(x), ymin = bb$conf.int[1], ymax = bb$conf.int[2])
}

prop_size_binom <- function(x, ...)  {
  data.frame(y = mean(x), size = sum(x))
}

pd <- position_dodge(width=0.5)
ggplot(dat, aes(year, error, colour = statistic, shape = categorisation)) +
  facet_wrap(~categorisation) +
  stat_summary(fun.data = prop_cl_binom,
               position = pd,
               geom = "linerange") +
  stat_summary(fun.data = prop_size_binom,
               position = pd,
               alpha = 0.5,
               geom = "point",
               show.legend = c(size = TRUE)) +
  ## scale_size_area() +
  scale_colour_discrete_qualitative() +
  ## scale_size_area(trans="log10") +
  scale_y_continuous(limits=c(0,NA), oob = scales::squish) +
  geom_smooth(method="loess",
              ## method.args = list(family=binomial),
              aes(group = 1),
              formula = y~x)


ggsave("CV543932_2.png", width=10, height=6)

Best Answer

Related Solutions

GLMER – Troubleshooting Model Convergence Issues in R Using lme4 nlme Packages

Multilevel Analysis – Why Does This 3-Level Multilevel Logistic Regression Fail to Converge?

Related Question