Binomial Distribution – Binomial vs. Quasi-Binomial Model: Comprehensive Comparison

binomial distributiongeneralized linear modelglmmquasi-likelihood

I was trying to fit a GLMM with a binomial distribution (for Yes/No data) in R, and kept running into convergence warnings, which seemed founded given the similar SE's and p-values for the different predictors in the model. After a bit of trial-and-error, I was able to fit this model by specifying BOBYQA as the optimizer for both parts and increasing the maximum number of iterations to 1000.

Example:

glmer(DCyn ~ Hc + Tc + Cc + Mc + (1|ID), 
    data=data, control = glmerControl(optimizer 
    = "bobyqa", 
    optCtrl = list(maxfun=2e4)))

However, a colleague suggested that I should try running the model with a quasi-binomial distribution. I had no problems fitting my original model with a quasi-binomial distribution (i.e., without change anything in control), so now I'm wondering which model is most appropriate for the data.

Normally, I would compare AICs between the two models, but I'm unsure how to do this with a quasi-binomial model (and whether the qAIC is comparable with the AIC from the binomial model). Any thoughts? Or is there a better way to compare these models?

Best Answer

Overdispersion is a phenomenon that often applies to binomial-like data ($y$ successes out of $n$ cases with $n>1$), but it cannot apply to binary data. If $y$ is binary (1/0) and $E(y)=p$ then it must be that var$(y)=p(1-p)$. There is no mathematical possibility that the variance can be any greater or less than that given $E(y)$.

If your response variable DCyn is binary (Yes/No) then it is impossible for it to be overdispersed (or under-dispersed) so the use of quasi-binomial models does not seem meaningful to me.

A similar point has been made before on this forum:

Binary data can, on the other hand, be correlated within groups, which is what a mixed model tries to estimate. If the binary observations are positively correlated within groups, then the data would in fact be overdispersed if aggregated into binomial observations by group. To fit a GLMM you would need to specify the family argument to glmer, which you have not currently done. From the code you give, you are fitting a Gaussian linear model to binary data, which is not at all appropriate. You don't appear to have fitted a binary or a quasi-binomial model, so trying to compare them is a bit moot.

Related Question