Solved – Default lme4 optimizer requires lots of iterations for high-dimensional data

lme4-nlmemixed modelnumericsoptimizationr

TL;DR: lme4 optimization appears to be linear in the number of model parameters by default, and is way slower than an equivalent glm model with dummy variables for groups. Is there anything I can do to speed it up?

I'm trying to fit a fairly large hierarchical logit model (~50k rows, 100 columns, 50 groups). Fitting a normal logit model to the data (with dummy variables for group) works fine, but the hierarchical model appears to be getting stuck: the first optimization phase completes fine, but the second goes through a lot of iterations without anything changing and without stopping.

EDIT: I suspect the problem is mainly that I have so many parameters, because when I try to set maxfn to a lower value it gives a warning:

Warning message:
In commonArgs(par, fn, control, environment()) :
  maxfun < 10 * length(par)^2 is not recommended.

However, the parameter estimates aren't changing at all over the course of the optimization, so I'm still confused about what to do. When I tried to set maxfn in the optimizer controls (despite the warning), it seemed to hang after finishing the optimization.

Here's some code that reproduces the problem for random data:

library(lme4)

set.seed(1)

SIZE <- 50000
NGRP <- 50
NCOL <- 100

test.case <- data.frame(i=1:SIZE)
test.case[["grouping"]] <- sample(NGRP, size=SIZE, replace=TRUE, prob=1/(1:NGRP))
test.case[["y"]] <- sample(c(0, 1), size=SIZE, replace=TRUE, prob=c(0.05, 0.95))

test.formula = y ~ (1 | grouping)

for (i in 1:NCOL) {
    colname <- paste("col", i, sep="")
    test.case[[colname]] <- runif(SIZE)
    test.formula <- update.formula(test.formula, as.formula(paste(". ~ . +", colname)))
}

print(test.formula)

test.model <- glmer(test.formula, data=test.case, family='binomial', verbose=TRUE)

This outputs:

start par. =  1 fn =  19900.78 
At return
eval:  15 fn:      19769.402 par:  0.00000
(NM) 20: f = 19769.4 at           0     <other numbers>
(NM) 40: f = 19769.4 at           0     <other numbers>

I tried setting ncol to other values, and it appears that the number of iterations done is (approximately) 40 per column. Obviously, this becomes a huge pain as I add more columns. Are there tweaks I can make to the optimization algorithm that will reduce the dependence on the number of columns?

Best Answer

One thing you could try is to change the optimizer. See Ben Bolker's comment at this github issue. The nlopt implementation of bobyqa is usually much faster than the default (at least whenever I try it).

library(nloptr)
defaultControl <- list(algorithm="NLOPT_LN_BOBYQA",xtol_rel=1e-6,maxeval=1e5)
nloptwrap2 <- function(fn,par,lower,upper,control=list(),...) {
    for (n in names(defaultControl)) 
      if (is.null(control[[n]])) control[[n]] <- defaultControl[[n]]
    res <- nloptr(x0=par,eval_f=fn,lb=lower,ub=upper,opts=control,...)
    with(res,list(par=solution,
                  fval=objective,
                  feval=iterations,
                  conv=if (status>0) 0 else status,
                  message=message))
}

system.time(test.model <- glmer(test.formula, data=test.case, 
family='binomial', verbose=TRUE))

system.time(test.model2 <- update(test.model,
control=glmerControl(optimizer="nloptwrap2"))

Also, see this answer for more options and this thread from R-sig-mixed-models (which looks more relevant to your issue).

Edit: I gave you some out-of-date info related to nloptr. In lme4 1.1-7 and up, nloptr is automatically imported (see ?nloptwrap). All you have to do is add

control = [g]lmerControl(optimizer = "nloptwrap") # +g if fitting with glmer

to your call.

Related Solutions

Solved – Can you have a conditional logit without fixed effects or a simple logit with conditional probabilities

After a week of intense learning we are now able to answer our own question. We post the answer here because it might be of some help to other people and because experts could still cross-validate it.

We came to the following conclusion. When the number of alternatives is large, estimating a conditional logit model is approximately equivalent to estimate an unconditional logit model with a dummy variable for each group and a constant and then conditioning the predicted odds for each alternative by the sum of the predicted odds for all the alternatives available to a given decision maker (i.e. all the alternatives within a group). The estimated coefficients and the predicted probabilities (after the transformation) would be approximately the same in the two cases. In fact, the coefficients of the conditional logit model refers to the linear prediction assuming that there is no fixed effect. In other words the coefficients measure the increase in the linear prediction (i.e. the log of the odds) for a unit increase in the independent variable. This is consistent with what explained in the Stata documentation of xtlogit and clogit. In particular, at page 15 of the latter, it is stated:

If Ti is large for all groups, the bias of the unconditional fixed-effects estimator is not a concern, and we can confidently use logit with an indicator variable for each group.

By conditioning to the sum of predicted odds within group, the conditional logit effectively removes any possible fixed effect because whatever is constant within group would not matter after dividing each predicted odds by the sum within group. This is so because whatever is constant within group would cancel out. Therefore, if you do not have any fixed effect, estimating an unconditional logit model with a constant and an indicator variable for each group or one with a constant but without the indicator variable, should return approximately the same coefficients and the same predicted odds. Then, you can just divide the predicted odds by the sum of the predicted odds within group. This would return the predicted probability that a decision maker chooses a given alternative. These probabilities would, by construction, sum to 1 within group.

This is what we think we learned, if this is wrong please correct us.

Solved – Unbalanced linear mixed effect modeling for longitudinal data with lme4

Your second model does not make sense because observations are not grouped/clustered/nested within baseline_age.

The first model does make sense because observations are grouped/nested/clustered within subject_ID (because you have repeated measures). There is no further clustering as far as I can gather from the description.

So, a good initial model would be

mod2 <- lmer(DV ~ timepoint + Group + baseline_age + (1 | subject_ID), 
         data = mydat)

The coefficient for timepoint will provide the linear growth estimate(s) (whether it is coded as a factor or numeric matters, see below), the coefficient for Group will give the treatment effect, while controlling for differential baseline ages, while the random intercept for subject_ID will allow each subject's intercept (at the study inception point) to vary. You could include interactions of the fixed effects, at this stage, if this makes sense to your research question.

Subsequent to this, you might want to include one or more of the fixed effects variables as random coefficients (slopes) if the effects of these are thought to vary between subject, such as timepoint that you mention. Note that this is not the same as your statement in the question: "where (timepoint | subject_ID) reflects the fact that timepoint varies at the individual level". Obviously timepoint varies at the individual level, because you have repeated measures. The random intercept deals with this, but it assumes that the slope (coefficient) for it is the same for each subject. If you have reason to believe that the slopes should vary between subjects then you can include one or more on the left side of the | symbol). However, if timepoint is a factor then this will result in a separate random effect for each level, which will increase the computational burden an possibly cause numerical problems (that is, assuming you have enough observations to make such a model identifiable to begin with), as well as making the model interpretation more complex.

Also note that if timepoint is a factor then you will get a seperate fixed effect estimate for each level too, which may or may not be what you want (in my experience it is not usually what you want unless the number of levels is small). So, if it isn't already, you might also want to consider coding timepoint as a numeric variable, this will then give you a single fixed main effect. This will model linear growth in your outcome/response, giving each subject their own intercept. If you also add timepoint as a random slope, then you can allow each subject to have their own slope. If you want to cater for non-linear growth then you could add a quadratic variable for timepoint (centering it first to avoid collinearity).

Best Answer

Related Solutions

Solved – Can you have a conditional logit without fixed effects or a simple logit with conditional probabilities

Solved – Unbalanced linear mixed effect modeling for longitudinal data with lme4

Related Question