Solved – Singular fit with simplest random structure in glmer (lme4)

glmmlme4-nlmemixed model

I am trying to run mixed models (logistic regression) on a dataframe with the glmer function from lme4 but I always receive this message: "boundary (singular) fit: see ?isSingular"

Even if I create a model with just an intercept and and the simplest random part (random intercept for one factor), the variance for this random factor is 0.

 Family: binomial  ( logit )
Formula: PointGagneparleServeur ~ 1 + (1 | Tour)
   Data: DataModel_Logit_allRF_AusOpen
      AIC       BIC    logLik  deviance  df.resid 
 480.7822  488.5765 -238.3911  476.7822       362 
Random effects:
 Groups Name        Std.Dev.
 Tour   (Intercept) 0       
Number of obs: 364, groups:  Tour, 6
Fixed Effects:
(Intercept)  
     0.5639  
convergence code 0; 1 optimizer warnings; 0 lme4 warnings 

Though I have observations for all the values of the factor :

table(DataModel_Logit_allRF_AusOpen$PointGagneparleServeur,DataModel_Logit_allRF_AusOpen$Tour)

    1erTour 2emeTour 3emeTour 8eme Quart Demi
  0      26       24       12   35    20   15
  1      40       36       37   59    32   28

and the dependent variable PointGagneparleServeur is actually numeric.

(FYI, i recently "upgraded" my os to Catalina 10.15. Experiencing several bugs with other(non programming) softwares since. So, I am mentionning it just in case it could play a role…)

Does anyone have an idea on why I have this issue ?

Best Answer

Indeed, your data do not support that hypothesis that there is significant variation in the outcome between the levels of the grouping factor.

library(lme4)
library(tidyverse)

dat <- data.frame(f = rep(letters[1:6], c(26 + 40, 24 + 36, 12 + 37, 35 + 59, 20 + 32, 15 + 28)),
                  y = rep(c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1), c(26, 40, 24, 36, 12, 37, 35, 59, 20, 32, 15, 28)))

fit <- glmer(y ~ 1 + (1 | f), family = binomial, data = dat)

summary(fit)
> summary(fit)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: y ~ 1 + (1 | f)
   Data: dat

     AIC      BIC   logLik deviance df.resid 
   480.8    488.6   -238.4    476.8      362 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.3257 -1.3257  0.7543  0.7543  0.7543 

Random effects:
 Groups Name        Variance Std.Dev.
 f      (Intercept) 0        0       
Number of obs: 364, groups:  f, 6

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.5639     0.1090   5.173 2.31e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
convergence code: 0
boundary (singular) fit: see ?isSingular

This is because you don't have enough data to conclude that the groups significantly differ with respect to the outcome measure:

> dat %>% group_by(f) %>% summarize(m = mean(y))
# A tibble: 6 x 2
  f         m
  <fct> <dbl>
1 a     0.606
2 b     0.6  
3 c     0.755
4 d     0.628
5 e     0.615
6 f     0.651

Notice all the means are fairly close together. Given the total number of observations, and the number of observations in each group, and most critically, the number of groups (only 6) it's hard to say variation exists. I recommend looking into rstarnarm or brms to fit the analogous model from a fully Bayesian point of view. (However, this is my opinion, and there is by no means consensus in the statistical community on how to deal with this problem, see ?isSingular)

library(rstanarm)
fit2 <- stan_glmer(y ~ 1 + (1 | f), family = binomial, data = dat)
summary(fit2)
> fit2
stan_glmer
 family:       binomial [logit]
 formula:      y ~ 1 + (1 | f)
 observations: 364
------
            Median MAD_SD
(Intercept) 0.6    0.1   

Error terms:
 Groups Name        Std.Dev.
 f      (Intercept) 0.24    
Num. levels: f 6 

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg