Solved – Binomial glmm with a categorical variable with full successes

generalized linear modellme4-nlmerseparation

I am running a glmm with a binomial response variable and a categorical predictor. The random effect is given by the nested design used for the data collection. The data looks like this:

m.gen1$treatment
 [1] sucrose      control      protein      control      no_injection .....
Levels: no_injection control sucrose protein
m.gen1$emergence 
 [1]  1  0  0  1  0  1  1  1  1  1  1  0  0....
> m.gen1$nest
 [1] 1  1  1  2  2  3  3  3  3  4  4  4  .....
Levels: 1 2 3 4 5 6 8 10 11 13 15 16 17 18 20 22 24

The first model I run looks like this

m.glmm.em.<-glmer(emergence~treatment + (1|nest),family=binomial,data=m.gen1)

I get two warnings that look like this:

Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.0240654 (tol = 0.001, component 4)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

The model summary shows that one of the treatments has a unusually large standard error, which you can see here:

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)  
(Intercept)         2.565      1.038   2.472   0.0134 *
treatmentcontrol   -1.718      1.246  -1.378   0.1681  
treatmentsucrose   16.863   2048.000   0.008   0.9934  
treatmentprotein   -1.718      1.246  -1.378   0.1681 

I tried the different optimizers from glmer control and functions from other packages, and I get a similar output. I have run the model using glm ignoring the random effect, and the problem persist. While exploring the data I realized that the treatment with a high Std. error has only successes in the response variable. Just to check whether that could be causing the problem I added a fake data point with a "failure" for that treatment and the model runs smoothly, and gives reasonable standard error. You can see that here:

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)  
(Intercept)        3.4090     1.6712   2.040   0.0414 *
treatmentcontrol  -1.8405     1.4290  -1.288   0.1978  
treatmentsucrose  -0.2582     1.6263  -0.159   0.8738  
treatmentprotein  -2.6530     1.5904  -1.668   0.0953 .

I was wondering if my intuition is right about the lack of failures for that treatment preventing a good estimation, and how can I work around this issue.

Thanks in advance!

Best Answer

Your intuition is exactly right. This phenomenon is called complete separation. You can find quite a lot (now that you know its name) Googling around ... It is fairly thoroughly discussed here in a general context, and here in the context of GLMMs. The standard solution to this problem is to add a small term that pushes the parameters back toward zero -- in frequentist contexts this is called a penalized or bias-corrected method. The standard algorithm is due to Firth (1993, "Bias reduction of maximum likelihood estimates" Biometrika 80, 27-38), and is implemented in the logistf package on CRAN. In Bayesian contexts this is framed as adding a weak prior to the fixed-effect parameters.

To my knowledge Firth's algorithm hasn't been extended to GLMMs, but you can use the Bayesian trick by using the blme package, which puts a thin Bayesian layer over the top of the lme4 package. Here's an example from the above-linked GLMM discussion:

cmod_blme_L2 <- bglmer(predation~ttt+(1|block),data=newdat,
                   family=binomial,
                   fixef.prior = normal(cov = diag(9,4)))

The first two lines in this example are exactly the same as we would use in the standard glmer model; the last specifies that the prior for the fixed effects is a multivariate normal distribution with a diagonal variance-covariance matrix. The matrix is 4x4 (because we have 4 fixed-effect parameters in this example), and the prior variance of each parameter is 9 (corresponding to a standard deviation of 3, which is pretty weak -- that means +/- 2SD is (-6,6), which is a very large range on the logit scale).

The very large standard errors of the parameters in your example are an example of a phenomenon closely related to complete separation (it occurs whenever we get extreme parameter values in a logistic model) called the Hauck-Donner effect.

Two more potentially useful references (I haven't dug into them yet myself):

  • Gelman A, Jakulin A, Pittau MG and Su TS (2008) A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics, 2, 1360–383.
  • José Cortiñas Abrahantes and Marc Aerts (2012) A solution to separation for clustered binary data Statistical Modelling 12(1):3–27 doi: 10.1177/1471082X1001200102

A more recent Google scholar search for "bglmer 'complete separation'" finds:

  • Quiñones, A. E., and W. T. Wcislo. “Cryptic Extended Brood Care in the Facultatively Eusocial Sweat Bee Megalopta genalis.” Insectes Sociaux 62.3 (2015): 307–313.