I am doing binomial GLM using relative abundance, for example:
model<-glm(cbind(number_pres,number_abs)~Var1+Var2+Var3+Var4…, family=binomial, data=Data). My sample size is about 700, and I have about 15 explanatory variables. I can't use Poisson because the total number of "trials" varies per sample point (relative abundance account for this), and I'd prefer not to simplify to presence/absence.
My global model is overdispersed (residual deviance/ degrees of freedom = 2.8), and has some funny patterns in the residuals (see below).
The overdispersion remains whether I add interactions, polynomials, transform variables, remove influential points, remove variables which had VIF ~4 (the highest VIF of the set). Removing the influential point and the highest VIF does seem to help with the residual patterns, but not overdispersion. I can use family=quasibinomial, but then of course many of the variables are no longer significant, and I find this harder to interpret/understand. If possible I'd like to just fix the overdispersion.
Two things I suspect may be causing issues are the high number of zeros in my species data, and something to do with spatial autocorrelation. I did a few tests and spatial autocorrelation of residuals might be a minor issue (in "car" Durbinwatsontest showed reject null of no autocorrelation, but in "gstat" variogram the semivariance hovered around 2-2.5). I repeated the model using presence/absence in a bernouilli glm (overdispersion doesn't exist for bernouilli), there are no residual patterns, and I get similar results when using a zero-inflated binomial glm (package glmmADMB). I have yet to find a zero-inflated model for binomial glm with proportions, but maybe this indicates that zeros aren't the problem either.
Should I just use quasibinomial glms for my model, and the subsequent nested model set? Or is there a solution I am missing?
Best Answer
Overdispersion occurs for a number of reasons, but often the case of presence/absence data is because of clustering of observations and correlations between observations.
Taken from Brostrom & Holmberg (2011) Generalised Linear Models with Clustered Data: Fixed and random effects models with glmmML
"Generally speaking, a random effects model is appropriate if the observed clusters may be regarded as a random sample from a (large, possibly infinite) pool of possible clusters. The observed clusters are of no practical interest per se, but the distribution in the pool is. Or this distribution is regarded as a nuisance that needs to be controlled for."
https://cran.r-project.org/web/packages/eha/vignettes/glmmML.pdf