Binomial GLMM in R – Fitting Glmer to Proportion Response Variables

binomial distributionglmmlme4-nlmeproportion;r

I'm hoping somebody can help with what I think is a relatively simple question, and I think I know the answer but without confirmation it has become something I just can't be certain of.

I have some count data as a response variable and I want to measure how that variable changes with the proportional presence of something.

In more detail, the response variable is counts of the presence of an insect species in a number of sites, so for example a site is sampled 10 times and this species may occur 4 times.

I want to see if this correlates with the proportional presence of a group of plant species in the overall commmunity of plants at these sites.

This means my data looks as follows (this is just an example)

Site, insectCount, NumberOfInsectSamples, ProportionalPlantGroupPresence
1, 5, 10, 0.5
2, 3, 10, 0.3
3, 7, 9, 0.6
4, 0, 9, 0.1

The data also includes a random effect for location.

I thought of two methods, one would be an linear model (lmer) with the insects converted to a proportion e.g.

 lmer.model<-lmer(insectCount/NumberOfInsectSamples~
 ProportionalPlantGroupPresence+(1|Location),data=Data)

The second would be a binomial GLMM (glmer)
e.g.

glmer.model <- glmer(cbind(insectCount,NumberOfInsectSamples-insectCount)~
 ProportionalPlantGroupPresence+(1|Location),
 data=Data,family="binomial")

I believe the binomial glmer to be the correct method, however they produce fairly different results. I cant seem to find a definitive answer on the net without still feeling slightly uncertain, and wish to make sure I am not making a mistake.

Any help or insight into alternative methods on this would be much appreciated.

Best Answer

The binomial GLMM is probably the right answer.

Especially with a small to moderate number of samples (9 and 10 in your example), the distribution of the response variable will probably be heteroscedastic (the variance will not be constant, and in particular will depend on the mean in systematic ways) and far from Normality, in a way that will be hard to transform away - especially if the proportions are close to 0 or 1 for some values of the predictor variable. That makes the GLMM a good idea.
You should be careful to check for/account for overdispersion. If you have a single observation (i.e. a single binomial sample/row in your data frame) per location then your (1|Site) random effect will automatically handle this (although see Harrison 2015 for a cautionary note)
if the previous assumption is right (you only have a single binomial sample per location), then you can also fit this as a regular binomial model (glm(...,family=binomial) -- in that case you can also use a quasibinomial model (family=quasibinomial) as a simpler, alternative way to account for overdispersion
if you like you can also fit your GLMM with the proportion as the response, if you set the weights argument to equal the number of samples:
```
 glmer(insectCount/NumberOfInsectSamples~ProportionalPlantGroupPresence+
       (1|Location),
       weights=NumberofInsectSamples,
       data=Data,family="binomial")
```
(this should give identical results to the glmer() fit you have in your question).

Harrison, Xavier A. “A Comparison of Observation-Level Random Effect and Beta-Binomial Models for Modelling Overdispersion in Binomial Data in Ecology and Evolution.” PeerJ 3 (July 21, 2015): e1114. doi:10.7717/peerj.1114.

Related Solutions

Solved – Binomial glmer with data between 0-1, not count data, not normal proportion

In my opinion, you are modeling the establishment limitation, you just made a mistake with the weights argument. From R help:

For the binomial and quasibinomial families the response can be specified in one of three ways:

As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).

As a numerical vector with values between 0 and 1, interpreted as the proportion of successful cases (with the total number of cases given by the weights).

As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.

Your data clearly fits in the second case, but you have to supply the total number of cases to weights.

The following code works for me:

model1 <- glmer(Est.Limit ~ Treatment + log(Size) + (1|Species),
                data    = Limitdr.Est,
                family  = "binomial",
                weights = Seed.sites)

the result:

> summary(model1)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: Est.Limit ~ Treatment + log(Size) + (1 | Species)
   Data: x
Weights: Seed.sites

     AIC      BIC   logLik deviance df.resid 
   110.3    117.8    -50.2    100.3       28 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7705 -0.4858 -0.1659  0.6247  2.0272 

Random effects:
 Groups  Name        Variance Std.Dev.
 Species (Intercept) 4.519    2.126   
Number of obs: 33, groups:  Species, 11

Fixed effects:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)           0.9418     0.8370   1.125 0.260523    
TreatmentIsland       0.5172     0.3926   1.317 0.187700    
TreatmentPlantation   1.6060     0.4670   3.439 0.000583 ***
log(Size)            -1.0333     0.3793  -2.724 0.006442 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) TrtmnI TrtmnP
TrtmntIslnd -0.230              
TrtmntPlntt -0.189  0.466       
log(Size)   -0.479 -0.041 -0.100

Hope this helps, Cheers!

Solved – Percentages as the response variable in GLMM (glmer), proportional binomial or not

I think that You might use simply a discrete binomial GLMM rather than continuous, which is just slightly different from the model You described. By the way, the warning message You've mentioned is no error: it simply notifies You, that the binomial response variable was continuous (having non-integer, i.e. non-zero and non-one values). If the non-integer values are between zero and one, there should be no problem.

If I understand correctly, You are interested in whether the structural vegetation coverage influences the presence of P. malvae eggs, correct? In that case, Occupancy should be the response variable, because You expect change in the presence of eggs in response to other environmental variables. Having the cause and causation in the right order helps to make sense of the results of such ecological models. In my opinion, the model You might want to use would look something like this:

glmer(Occupancy~VS_G+HP_spp+(1|VS_Date)+(1|Pair_nr), family="binomial", data=PM_data106)

In this model, You can add all 3 variables representing vegetation-coverages, although You should keep an eye out for the possible interactions between the three coverage variables. Also, i think it would make more ecological sense, to specify the hostplant species as fixed effect, because it has relevance to see, whether one plant species or another is preferred by the animal for laying eggs.

Cheers,

Best Answer

Related Solutions

Solved – Binomial glmer with data between 0-1, not count data, not normal proportion

Solved – Percentages as the response variable in GLMM (glmer), proportional binomial or not

Related Question