Solved – Modeling count data with left-skewed distribution

count-datanegative-binomial-distribution

I have 'count' data for species. My count is the number of 15-stop segments in which a species was detected (range 0-15). Sample size range from 350 to 400. I have about 20 predictor variables (e.g., the percentage of land cover type, rainfall in mm, number of assistants) and two interactions.
For most of my species, I have lots of zeros, right-skewed distribution, and overdispersion. Hence, I was planning to use negative binomial regression in R (glm.nb). However, for two of my species, there is a low 0-count, high 15-count, overdispersion, and left-skewed distribution, as can be seen in the example below:

enter image description here

My search online did not reveal any solution for modeling this type of count data. Should I be using a negative binomial model for species with this type of left-skewed distribution? Is there another type of count model I could use? I am also considering converting the data to the proportion of 15-stop segments in which the species was detected and running a linear model. Is that a wise step?

Alternatively, as discussed here can I invert my variable to the number of 15-stop segments in which the species was not detected and run a negative binomial regression?

Any help/suggestion is welcome.

Best Answer

You have data that is bounded above by 15 (and a lot of actual 15s) so a negative binomial has no hope of being suitable.

My first thought would be a binomial GLM; the binomial at least can obey the constraints of it being on 0-15.

It's possible that this is unsuitable -- you may need a quasi-binomial or a 0-inflated model or a beta-binomial or some other way of dealing with the overdispersion depending on its nature -- but the negative binomial isn't going to work at all.

It's hard to give much more specific advice outside of "don't use the negative binomial for this" without more context.