Binomial GLMM in R – Fitting Glmer to Proportion Response Variables

binomial distributionglmmlme4-nlmeproportion;r

I'm hoping somebody can help with what I think is a relatively simple question, and I think I know the answer but without confirmation it has become something I just can't be certain of.

I have some count data as a response variable and I want to measure how that variable changes with the proportional presence of something.

In more detail, the response variable is counts of the presence of an insect species in a number of sites, so for example a site is sampled 10 times and this species may occur 4 times.

I want to see if this correlates with the proportional presence of a group of plant species in the overall commmunity of plants at these sites.

This means my data looks as follows (this is just an example)

Site, insectCount, NumberOfInsectSamples, ProportionalPlantGroupPresence
1, 5, 10, 0.5
2, 3, 10, 0.3
3, 7, 9, 0.6
4, 0, 9, 0.1

The data also includes a random effect for location.

I thought of two methods, one would be an linear model (lmer) with the insects converted to a proportion e.g.

 lmer.model<-lmer(insectCount/NumberOfInsectSamples~
 ProportionalPlantGroupPresence+(1|Location),data=Data)

The second would be a binomial GLMM (glmer)
e.g.

glmer.model <- glmer(cbind(insectCount,NumberOfInsectSamples-insectCount)~
 ProportionalPlantGroupPresence+(1|Location),
 data=Data,family="binomial")

I believe the binomial glmer to be the correct method, however they produce fairly different results. I cant seem to find a definitive answer on the net without still feeling slightly uncertain, and wish to make sure I am not making a mistake.

Any help or insight into alternative methods on this would be much appreciated.

Best Answer

The binomial GLMM is probably the right answer.

  • Especially with a small to moderate number of samples (9 and 10 in your example), the distribution of the response variable will probably be heteroscedastic (the variance will not be constant, and in particular will depend on the mean in systematic ways) and far from Normality, in a way that will be hard to transform away - especially if the proportions are close to 0 or 1 for some values of the predictor variable. That makes the GLMM a good idea.
  • You should be careful to check for/account for overdispersion. If you have a single observation (i.e. a single binomial sample/row in your data frame) per location then your (1|Site) random effect will automatically handle this (although see Harrison 2015 for a cautionary note)
  • if the previous assumption is right (you only have a single binomial sample per location), then you can also fit this as a regular binomial model (glm(...,family=binomial) -- in that case you can also use a quasibinomial model (family=quasibinomial) as a simpler, alternative way to account for overdispersion
  • if you like you can also fit your GLMM with the proportion as the response, if you set the weights argument to equal the number of samples:

     glmer(insectCount/NumberOfInsectSamples~ProportionalPlantGroupPresence+
           (1|Location),
           weights=NumberofInsectSamples,
           data=Data,family="binomial")
    

    (this should give identical results to the glmer() fit you have in your question).

Harrison, Xavier A. “A Comparison of Observation-Level Random Effect and Beta-Binomial Models for Modelling Overdispersion in Binomial Data in Ecology and Evolution.” PeerJ 3 (July 21, 2015): e1114. doi:10.7717/peerj.1114.

Related Question