Solved – Use of Gamma Distribution for count data

count-datagamma distributionlme4-nlmepoisson distribution

I am working on my data including the insect abundance in dependence of landscape variables with a nested random effect.
Since i collected the individuals in the field i have count data and thought a poisson distribution would fit best (using lme4).
Currently my model looks like this example glmer(insects~landscape1*landscape2 +(1|region/location), family="poisson", data=..). (no overdisperion)

Following "A practical guide to mixed models in R" http://ase.tufts.edu/gsc/gradresources/guidetomixedmodelsinr/mixed%20model%20guide.html
i "should use" a lognormal or gamma distribution since they fit best.
I tried lme with log-transformed response and glmer with gamma even if I have no continous data and both show similiar results in contrast to the glmer with poisson distribution. I also tried using ord_plot() and distplot() (pos. slope, negative intercept) which showed that log.series would be the best choice again (according to Friendly http://www.datavis.ca/courses/grcat/grcat.pdf chapter 2.3.).

I don't want to use the log-transformed approach but was wondering if I could also use gamma for discrete count data or only for continuous?
Or do you can suggest any alternatives not using poisson for count data?
Hope to get some new insights.
Thank you


Here are the two outputs of the methods i was following to decide for the right distribution.
There is one count with 0.
I am not that familiar with statistics yet but i read a lot that log-transformed data are difficult to interpret and that it is better to go for another method if the data are not normally distributed. So, i thought there should be another way.

Output of qqp() following Friendly

ord_plot()

Best Answer

I think that you have been misled by "A practical guide to mixed models in R". The tests for normal, Poisson, gamma, etc. distributions proposed on that site look at the distribution of the response variable without reference to the values of the predictors. That's not helpful, and if that's what you did the results are not pointing you in the correct direction.

For example, in a simple linear regression with a true linear relation between predictor and response, the response variable will tend to follow the distribution of the predictor variable. If the predictor variable doesn't have a normal distribution there is no reason to suspect that the response variable has a normal distribution. The errors between predicted and observed response-variable values ideally are normal, but little is to be gained by looking at the distribution of response-variable values while ignoring the corresponding predictors.

This becomes even more of an issue when you have multiple predictors and interactions, as in your case. There is no reason to suspect that your response variable values overall show any single distribution; the tests you performed based on that website's suggestions thus may be leading you astray. Your responses may actually be Poisson-distributed, but with the mean value of the Poisson distribution depending on the values of the predictors. As I read that website and look at your plots, it seems that the values of the predictors are nowhere considered. The points with highest quantiles on your Poisson plot may well be those whose predictor variables predict higher insect numbers.

Ignore those plots, and instead fit a model that shows well-behaved residuals. A generalized linear model based on the Poisson family would be a good place to start; that will allow for Poisson-based responses whose mean values are related to the predictors. As @Björn suggested in another answer, related approaches might be useful if Poisson in its simplest form fails.

Finally, I don't at all agree that log-transformed variables are hard to interpret. Changes in log-scale units are simply fractional or percentage changes in the original units. Sometimes percentage changes are even easier to think about.

Related Question