Solved – Interpreting dispersion parameters of poisson GLMM with count data

generalized linear modelglmmoverdispersion

I am working with count data and trying to understand if my model fit is acceptable for this poisson Generalized Linear Mixed Model:

Richness.glmer<-glmer(Richness ~ Unit.type + plot.type + (1|NFI.Point), family = poisson, data = PartA.data.Birds )

I ran a dispersion test using overdisp_fun and got the following result:

chisq = 55.90, ratio = 0.65, p = 0.995, logp = -0.00489

With a ratio of 0.65, does this mean my model is acceptable? Or is it underdispersed? I also made the following residual plot of my data, which appears to be okay, but I am not sure if the combination of this residual plot that the 0.65 dispersion ratio indicates a good model fit or not

I understand what I would do if I was running a Generalized Linear Model that had overdispersion issues (which seems to be the most common problem). But in my case with my GLMM I am uncertain of:

1). If my model fit is acceptable, and, if not
2). What to do about it.

Best Answer

A ratio below 1 suggests that you have under-dispersion. However, 0.69 is not very small and might be due to sampling variation, particularly since you have only 24 observations in total, so I would not be too concerned at this point.

Under-dispersion can arise from a poorly specified model. If you had more data I would suggest trying a random slopes model and possibly a model with an autocorrelation structure. Bootstrapping is another option, but again, with so little data, it may not be very reliable.

Note also that your random intercept has very low variance, so you might try comparing the model with a regular glm() and also using Conway-Maxwell-Poisson regression, available in the compoisson package for R, which specifically handles under-dispersed count data.

Overdispersion Problem

It looks like you're modeling a count variable as a binomial and I think that's the source of your overdispersion.

You could model everything as a binomial distribution, but the total for each observation is exactly the same. ~~Plus, the count of diseased plants never reaches the maximum of 100, so it's not really censored the way a binomial would be.~~

EDIT: So, you could easily report this as a "rate" of disease over the total sample. In this way you could analyze the 'count' of disease or proportion (disease / total) as a negative binomial model.

EDIT2: Because there seems to be some hesitance to use a negative binomial, here is a list of recent phytopathology articles (same discipline as OP) that model disease as a negative binomial (Prager et al., 2014, Mori et al., 2008, Passey et al., 2017, Paiva de Almeida et al., 2016)

A histogram of your y variable looks like a zero inflated negative binomial.

Note the long right tail that you typically see with a negative binomial or Poisson.

There are a few different ways to handle this, but here's an easy solution:

m4<-glmer.nb(dis ~ trt + (1 | farm/bk),data = dinc)

summary(m4)
overdisp_fun(m4)

I got the following overdispersion results:

      chisq       ratio         rdf           p 
122.1655582   1.0811111 113.0000000   0.2617332

Looks good, right?

(EDIT: Ignore strikethrough portion below)

~~### Side Issue: Your Trees are Independent Observations~~

At first, it looks like each of the two trees should be a random effect.

However, Tree 1 on farm 1 is not comparable to Tree 1 in farm 2. Therefore, you don't want to model the effect of Tree as a random effect. Imagine if each Tree was a different person. Adding a random effect for each person wouldn't matter unless you had multiple observations per person.

~~Similarly, including the farm "block" doesn't really have an effect on the model.~~

Alternative Models and Final Thoughts

Could potentially check out zero inflated negative binomial
Although your dispersion doesn't seem bad with standard nb
The MASS package is an alternative way to run a nb model
Additionally you could run this as a Quasi-Poisson
I'll include some code below, in case you want to pursue this

 require("MASS")

 m5<-glmmPQL(dis ~ trt ,
             random = ~ 1 | farm/bk,
             family = negative.binomial(theta=9.86), 
             data = dinc)

 summary(m5)

 m6<-glmmPQL(dis ~ trt ,
             random = ~ 1 | farm/bk,
             family = quasipoisson(link='log'), 
             data = dinc)

 summary(m6)

Best of luck with your model!

EDIT In case you'd like to run this as a "rate", please try this code:

dinc$dis_prob<- dinc$dis / dinc$tot 

m7<-glmmPQL(dis_prob ~ trt ,
             random = ~ 1 | farm/bk,
             family = quasipoisson(link='log'), 
             data = dinc)

summary(m7)

Best Answer

Related Solutions

Solved – How to assess overdispersion in Poisson GLMM, lmer( )

Solved – GLMM for overdispersed data

Overdispersion Problem

Alternative Models and Final Thoughts

Related Question