Solved – How to account for a lack of fit using a quasi-poisson on non-integer, overdispersed data

glmmmixed modelpoisson distributionr

I am trying to run a mixed model on over-dispersed non-integer data. My data are not counts, but are zero-inflated and over dispersed. The variable is distance (how far a gps point is from a central location) and as such looks like: 0.33, 64.73, 5.2 etc. I have been using a quasi-Poisson distribution as I have read that quasi can handle non-integer data (both Poisson and negative binomial cannot). I am using the glmmPQL function in package MASS as this allows quasi distributions with a random term (the identity of the individual that the gps point comes from).The functions glmm and lmer do not work with a quasi-Poisson distribution. Plotting the residuals indicates a lack of fit of this model.log-transforming the data to try and make it normal before hand also fails (the Shapiro-test for normality is significant). I am unsure how to fix this, as I seemingly have to use a quasi-distribution (link="log") because my data is not counts, non-integer and not normal but there is still overdispersion and lack of fit when using this distribution.

My question therefore is: How to model over-dispersed, non-integer data in a mixed model when quasi-Poisson does not seem to work?

My code so far is:

summary(glmmPQL(distance_from_centroid~Chick.Juv.Adult+Summer_winter, 
                random=~1|markingnumber, family=quasipoisson(link="log"),
                data=centroid_distances))

Which results in:

Linear mixed-effects model fit by maximum likelihood
 Data: centroid_distances 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 `Formula: ~1 | markingnumber
        (Intercept) Residual
StdDev:    1.157381 2.136811

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects: distance._from_centroid ~ Chick.Juv.Adult + Summer_winter 
                      Value  Std.Error  DF   t-value p-value
(Intercept)       2.0670095 0.09403952 695 21.980221  0.0000
Chick.Juv.AdultC -0.2945360 0.06686399 695 -4.405002  0.0000
Chick.Juv.AdultJ -0.2005831 0.06727181 695 -2.981682  0.0030
Summer_winterW    0.1207721 0.04324588 695  2.792684  0.0054
 Correlation: 
                 (Intr) C.J.AC C.J.AJ
Chick.Juv.AdultC -0.565              
Chick.Juv.AdultJ -0.512  0.736       
Summer_winterW   -0.267  0.134  0.043

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.53759073 -0.48277169 -0.31041612  0.06314122  7.48672836 

Number of Observations: 1009
Number of Groups: 311 

Which when plotting the residuals gives me:

plot of residuals

Best Answer

There is not really such a thing as "over-dispersed data" in abstract. Over-dispersion means that the variability of the data is more than expected, and without a specific context there is no "expected" dispersion. An expected variability exists in just a few specific situations usually involving count data: for example, binomial sampling (number of successes out of n trials) and Poisson sampling (number of events over a period of time with a constant instantaneous event rate). For these settings one can derive the distribution, and it turns out that the variance is a function of the mean. For example, for binomial $E(X)=np$, and $Var(X)=np(1-p)$, while for Poisson $E(X)=Var(X)=\lambda$. So once you know the mean, you know what the variance should be. If you have count data that is generated by a Poisson-like sampling process with some deviations (eg individuals have their own event rate) and you find that $Var(X)>E(X)$, then you can talk of over-dispersion relative to the Poisson distribution.

You give no reason to think that the process generating the data resembles Poisson sampling (it is not even integer counts), so there is no reason why the variance should/could be equal to the mean or any other specific value, thus the concept of "over-dispersion" does not apply. There are lots of two-parameter (or more) distributions that can model mean and variance separately. Since your data seems to be positive and skewed, perhaps the gamma distribution would work (if the log-normal did not). There are lots of other possible approaches for modeling your data, just forget about the straight-jacket of the Poisson distribution when it really does not apply.