Solved – Accounting for overdispersion in binomial glm using proportions, without quasibinomial

autocorrelationbinomial distributiongeneralized linear modeloverdispersionquasi-binomial

I am doing binomial GLM using relative abundance, for example:
model<-glm(cbind(number_pres,number_abs)~Var1+Var2+Var3+Var4…, family=binomial, data=Data). My sample size is about 700, and I have about 15 explanatory variables. I can't use Poisson because the total number of "trials" varies per sample point (relative abundance account for this), and I'd prefer not to simplify to presence/absence.

My global model is overdispersed (residual deviance/ degrees of freedom = 2.8), and has some funny patterns in the residuals (see below).

Validation plots using R plot(model)

The overdispersion remains whether I add interactions, polynomials, transform variables, remove influential points, remove variables which had VIF ~4 (the highest VIF of the set). Removing the influential point and the highest VIF does seem to help with the residual patterns, but not overdispersion. I can use family=quasibinomial, but then of course many of the variables are no longer significant, and I find this harder to interpret/understand. If possible I'd like to just fix the overdispersion.

Two things I suspect may be causing issues are the high number of zeros in my species data, and something to do with spatial autocorrelation. I did a few tests and spatial autocorrelation of residuals might be a minor issue (in "car" Durbinwatsontest showed reject null of no autocorrelation, but in "gstat" variogram the semivariance hovered around 2-2.5). I repeated the model using presence/absence in a bernouilli glm (overdispersion doesn't exist for bernouilli), there are no residual patterns, and I get similar results when using a zero-inflated binomial glm (package glmmADMB). I have yet to find a zero-inflated model for binomial glm with proportions, but maybe this indicates that zeros aren't the problem either.

Should I just use quasibinomial glms for my model, and the subsequent nested model set? Or is there a solution I am missing?

Best Answer

Overdispersion occurs for a number of reasons, but often the case of presence/absence data is because of clustering of observations and correlations between observations.

Taken from Brostrom & Holmberg (2011) Generalised Linear Models with Clustered Data: Fixed and random effects models with glmmML

"Generally speaking, a random effects model is appropriate if the observed clusters may be regarded as a random sample from a (large, possibly infinite) pool of possible clusters. The observed clusters are of no practical interest per se, but the distribution in the pool is. Or this distribution is regarded as a nuisance that needs to be controlled for."

https://cran.r-project.org/web/packages/eha/vignettes/glmmML.pdf

library(lme4) 
library(RVAideMemoire)
Data$obs <- factor(formatC(1:nrow(Data), flag="0", width = 3))
model.glmm <- glmer(cbind(number_pres,number_abs) ~ Var1+Var2+Var3+Var4...+
(1|obs),family = binomial (link = logit),data = Data) 
overdisp.glmer(model.glmm) #Overdispersion for GLMM