Solved – Binomial glmm with a categorical variable with full successes

generalized linear modellme4-nlmerseparation

I am running a glmm with a binomial response variable and a categorical predictor. The random effect is given by the nested design used for the data collection. The data looks like this:

m.gen1$treatment
 [1] sucrose      control      protein      control      no_injection .....
Levels: no_injection control sucrose protein
m.gen1$emergence 
 [1]  1  0  0  1  0  1  1  1  1  1  1  0  0....
> m.gen1$nest
 [1] 1  1  1  2  2  3  3  3  3  4  4  4  .....
Levels: 1 2 3 4 5 6 8 10 11 13 15 16 17 18 20 22 24

The first model I run looks like this

m.glmm.em.<-glmer(emergence~treatment + (1|nest),family=binomial,data=m.gen1)

I get two warnings that look like this:

Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.0240654 (tol = 0.001, component 4)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

The model summary shows that one of the treatments has a unusually large standard error, which you can see here:

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)  
(Intercept)         2.565      1.038   2.472   0.0134 *
treatmentcontrol   -1.718      1.246  -1.378   0.1681  
treatmentsucrose   16.863   2048.000   0.008   0.9934  
treatmentprotein   -1.718      1.246  -1.378   0.1681

I tried the different optimizers from glmer control and functions from other packages, and I get a similar output. I have run the model using glm ignoring the random effect, and the problem persist. While exploring the data I realized that the treatment with a high Std. error has only successes in the response variable. Just to check whether that could be causing the problem I added a fake data point with a "failure" for that treatment and the model runs smoothly, and gives reasonable standard error. You can see that here:

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)  
(Intercept)        3.4090     1.6712   2.040   0.0414 *
treatmentcontrol  -1.8405     1.4290  -1.288   0.1978  
treatmentsucrose  -0.2582     1.6263  -0.159   0.8738  
treatmentprotein  -2.6530     1.5904  -1.668   0.0953 .

I was wondering if my intuition is right about the lack of failures for that treatment preventing a good estimation, and how can I work around this issue.

Thanks in advance!

Best Answer

Your intuition is exactly right. This phenomenon is called complete separation. You can find quite a lot (now that you know its name) Googling around ... It is fairly thoroughly discussed here in a general context, and here in the context of GLMMs. The standard solution to this problem is to add a small term that pushes the parameters back toward zero -- in frequentist contexts this is called a penalized or bias-corrected method. The standard algorithm is due to Firth (1993, "Bias reduction of maximum likelihood estimates" Biometrika 80, 27-38), and is implemented in the logistf package on CRAN. In Bayesian contexts this is framed as adding a weak prior to the fixed-effect parameters.

To my knowledge Firth's algorithm hasn't been extended to GLMMs, but you can use the Bayesian trick by using the blme package, which puts a thin Bayesian layer over the top of the lme4 package. Here's an example from the above-linked GLMM discussion:

cmod_blme_L2 <- bglmer(predation~ttt+(1|block),data=newdat,
                   family=binomial,
                   fixef.prior = normal(cov = diag(9,4)))

The first two lines in this example are exactly the same as we would use in the standard glmer model; the last specifies that the prior for the fixed effects is a multivariate normal distribution with a diagonal variance-covariance matrix. The matrix is 4x4 (because we have 4 fixed-effect parameters in this example), and the prior variance of each parameter is 9 (corresponding to a standard deviation of 3, which is pretty weak -- that means +/- 2SD is (-6,6), which is a very large range on the logit scale).

The very large standard errors of the parameters in your example are an example of a phenomenon closely related to complete separation (it occurs whenever we get extreme parameter values in a logistic model) called the Hauck-Donner effect.

Two more potentially useful references (I haven't dug into them yet myself):

Gelman A, Jakulin A, Pittau MG and Su TS (2008) A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics, 2, 1360–383.
José Cortiñas Abrahantes and Marc Aerts (2012) A solution to separation for clustered binary data Statistical Modelling 12(1):3–27 doi: 10.1177/1471082X1001200102

A more recent Google scholar search for "bglmer 'complete separation'" finds:

Quiñones, A. E., and W. T. Wcislo. “Cryptic Extended Brood Care in the Facultatively Eusocial Sweat Bee Megalopta genalis.” Insectes Sociaux 62.3 (2015): 307–313.

Related Solutions

Solved – Coding of categorical random effects in R: int vs factor

The difference is because you're separating the intercept and the slope in the random effect. That's an odd thing to do; the usual way to fit this model would be

OK ~ multi + (multi | item) + (1 | subject)

with multi being a factor.

What happens is that in the first model you get what you expect; the 0+multi|item term gives one parameter and the 1|item term gives one parameter, but in the second model the 0 + multi | item term results in two parameters, which are simply the estimate for each condition. If you take the 1|item term out of that model you should get a result that is equivalent to both your first model and the one I give above, except for differences in parameterization.

Note also the correlation of exactly one in your second model; this is a clue that you've overparameterized it and that one of those parameters is not necessary.

Solved – Graphical representation of fixed effects from lmer

The simply way to get predicted values is to provide a data.frame with just the minimal fixed effects you want to plot. Then, when you run predict using that as the newdata set the re.form argument to NA.

see ?predict.merMod

(BTW, I think you want 1|subject or something for your random effect. Your replications are nested within subject, not the subjects in replications?)

Best Answer

Related Solutions

Solved – Coding of categorical random effects in R: int vs factor

Solved – Graphical representation of fixed effects from lmer

Related Question