My general preference, when comparing a more complex model (here, NB) to a less complex one (here Poisson) is not to rely on any statistical test, but to run both and see if the predicted values are substantially different. (And what 'substantially' means is dependent on the field you are working in). If they are, then prefer the more complex model. If not, the simpler.
This allows us not to rely on arbitrary cutoffs; it requires us to employ judgement. Those are, in my opinion, good things.
Overdispersion Problem
It looks like you're modeling a count variable as a binomial and I think that's the source of your overdispersion.
You could model everything as a binomial distribution, but the total for each observation is exactly the same. Plus, the count of diseased plants never reaches the maximum of 100, so it's not really censored the way a binomial would be.
EDIT: So, you could easily report this as a "rate" of disease over the total sample. In this way you could analyze the 'count' of disease or proportion (disease / total) as a negative binomial model.
EDIT2: Because there seems to be some hesitance to use a negative binomial, here is a list of recent phytopathology articles (same discipline as OP) that model disease as a negative binomial (Prager et al., 2014, Mori et al., 2008, Passey et al., 2017, Paiva de Almeida et al., 2016)
A histogram of your y variable looks like a zero inflated negative binomial.
Note the long right tail that you typically see with a negative binomial or Poisson.
There are a few different ways to handle this, but here's an easy solution:
m4<-glmer.nb(dis ~ trt + (1 | farm/bk),data = dinc)
summary(m4)
overdisp_fun(m4)
I got the following overdispersion results:
chisq ratio rdf p
122.1655582 1.0811111 113.0000000 0.2617332
Looks good, right?
(EDIT: Ignore strikethrough portion below)
### Side Issue: Your Trees are Independent Observations
At first, it looks like each of the two trees should be a random effect.
However, Tree 1 on farm 1 is not comparable to Tree 1 in farm 2. Therefore, you don't want to model the effect of Tree as a random effect. Imagine if each Tree was a different person. Adding a random effect for each person wouldn't matter unless you had multiple observations per person.
Similarly, including the farm "block" doesn't really have an effect on the model.
Alternative Models and Final Thoughts
- Could potentially check out zero inflated negative binomial
- Although your dispersion doesn't seem bad with standard nb
- The MASS package is an alternative way to run a nb model
- Additionally you could run this as a Quasi-Poisson
- I'll include some code below, in case you want to pursue this
require("MASS")
m5<-glmmPQL(dis ~ trt ,
random = ~ 1 | farm/bk,
family = negative.binomial(theta=9.86),
data = dinc)
summary(m5)
m6<-glmmPQL(dis ~ trt ,
random = ~ 1 | farm/bk,
family = quasipoisson(link='log'),
data = dinc)
summary(m6)
Best of luck with your model!
EDIT In case you'd like to run this as a "rate", please try this code:
dinc$dis_prob<- dinc$dis / dinc$tot
m7<-glmmPQL(dis_prob ~ trt ,
random = ~ 1 | farm/bk,
family = quasipoisson(link='log'),
data = dinc)
summary(m7)
Best Answer
A ratio below 1 suggests that you have under-dispersion. However, 0.69 is not very small and might be due to sampling variation, particularly since you have only 24 observations in total, so I would not be too concerned at this point.
Under-dispersion can arise from a poorly specified model. If you had more data I would suggest trying a random slopes model and possibly a model with an autocorrelation structure. Bootstrapping is another option, but again, with so little data, it may not be very reliable.
Note also that your random intercept has very low variance, so you might try comparing the model with a regular
glm()
and also using Conway-Maxwell-Poisson regression, available in thecompoisson
package for R, which specifically handles under-dispersed count data.