Solved – lme4: glmer problems with offset()

binomial distributionlme4-nlmeoffsetr

this is my first post, so I hope everything is in the right format. I have some problems with glmer and don't know how to fix it, so I hope somebody can help me out with this. I could not find an answer to this anywhere.

My experiment: The experimental setup has eight sites, each site has a central area in which organisms get marked. From the central area, organisms have the choice to go into three different areas which are equipped with traps. The three areas are crossed with sites. The recapture rates in these three areas are very low and the sampling effort (trap days) is very high. Due to external influences, traps got destroyed to different degrees in all the sampling areas. In addition to the marked organisms, unmarked organisms of the same species get caught, too.

The dataset looks something like this:

library(lme4)
data <- structure(list(site = c("A", "A", "A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "E", "F", "F", "F", "G", "G", "G", "H", "H", "H"), area = c("I", "II", "III", "I", "II", "III", "I", "II", "III", "I", "II", "III", "I", "II", "III", "I", "II", "III", "I", "II", "III", "I", "II", "III"), marked = c(2, 6, 3, 5, 3, 9, 0, 8, 1, 1, 1, 18, 3, 0, 0, 1, 5, 6, 3, 0, 2, 2, 4, 5), unmarked = c(38, 78, 104, 1, 6, 10, 1, 13, 0, 13,7, 85, 7, 1, 0, 9, 4, 36, 3, 4, 3, 10, 20, 29), sampl_effort = c(9300, 9100, 8700, 9900, 9600, 8600, 9800, 9400, 10800, 11600, 11000, 13950, 10300, 9700, 9800, 10450, 10100, 10800, 9600, 9900, 9300, 11800, 11250, 9450)), .Names = c("site", "area", "marked", "unmarked", "sampl_effort"), row.names = c(NA,-24), class = "data.frame")

Now I wanted to fit a glmer. Because the amount of marked organisms may be related to the total amount of organisms, I chose to take a binomial approach, with cbind(marked, unmarked). I use area with the three treatments as explanatory variable, and site as random factor. Because the sampling effort differs between the different areas, I want to include it as an offset. The code looks like this:

mod.glmer1= glmer(cbind(marked,unmarked) ~ area + (1 | site) + offset(sampl_effort), family=binomial, data=data)

Then, I get the error:

Error: (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate

If I try the glmer without the offset, everything works out fine:

mod.glmer2= glmer(cbind(marked,unmarked) ~ area + (1 | site), family=binomial, data=data)

Out of interest, I tried a glm without the random factor and with offset:

mod.glm1= glm(cbind(marked,unmarked) ~ area + offset(sampl_effort), family=binomial, data=data)

and I get the following Warnings:

Warning messages:
1: glm.fit: fitted probabilities numerically 0 or 1 occurred
2: glm.fit: fitted probabilities numerically 0 or 1 occurred

By looking at the fitted values

round(cbind(data[,3:5],fits=fitted(mod.glm1)),8)

I can see that all the fitted values are 0.
I now thought that the offset is just too large to get reasonable fitted value. As the sampling effort is arbitrary (instead of days, I could have taken sampling hours, weeks, etc.), I decided to divide the sampling effort by 1000.
By doing this, the glm now works:

mod.glm2= glm(cbind(marked,unmarked) ~ area + offset(sampl_effort/1000), family=binomial, data=data)

However, for the glmer

mod.glmer3= glmer(cbind(marked,unmarked) ~ area + (1 | site) + offset(sampl_effort/1000), family=binomial, data=data)

I still get

Error: (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate

So my questions are basically:

Am I allowed to divide the sampling effort by 1000? In my mind it should lead to the same results, as the relative differences in sampling effort stay the same (and sampling days are a arbitrary measurement). However, I of course tried it with e.g. dividing by 10000 and I get different results.
How can I include the sampling effort in the glmer? The sampling effort is just too important to keep it out, however, I also need the sites as random factor. Is the offset the right approach and if yes, why doesn't it work.

P.S.: My session information:

R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] car_2.0-19 lme4_1.0-6 Matrix_1.1-2 lattice_0.20-23

loaded via a namespace (and not attached):
[1] grid_3.0.2 MASS_7.3-29 minqa_1.2.3 nlme_3.1-111 nnet_7.3-7
[6] Rcpp_0.11.0 splines_3.0.2 tools_3.0.2

Best Answer

I don't think you do want to add sampling_effort as a offset at all.

You are estimating the logit of marked over unmarked catch probabilities - call that quantity $\eta$ - and you think, quite reasonably, that the quantity $N=$ marked+unmarked will be a function of how long the traps remain undamaged, i.e. a function of sampling_effort. However, it's not obvious why that relationship is relevant; your binomial model already treats $N$ as known and conditions on it.

The circumstance I can imagine where it would be relevant is when $\eta$ really depends on sampling_effort. To examples would be a) if the rate of unmarked captures were constant but the rate of marked captures is increasing because of some feature of the experiment such as marked animals being expected to take a random walk from a common origin, or b) if capturing one kind of animal makes it more or less likely to capture the other type. But even in these cases sampling_effort would be a regular covariate and not have its coefficient fixed at 1 as an offset would have.

Here's some possibly relevant discussion of this question elsewhere on the site.

Related Solutions

Solved – lme4: Problems trying to run negative binomial glmer.nb

There a several things going on here.

It's not worth worrying about any aspects of diagnostics or model fit until you sure you've specified your model (and in this particular case your response variable) correctly. For binomial GLM(M)s in R, the response matrix should be cbind(num_successes,num_failure). Is your Total.ON the total number of cases measured for each observation (i.e. successes + failures)? If so, you should use cbind(Total.ON-Not.ON,Not.ON) ...

if (as mentioned in comments) Total.ON is a proportion successful and Not.ON is a total number of observations, you should use either cbind(Total.ON*Not.ON,(1-Total.ON)*Not.ON) or (my preference) use Total.ON as your response and include weights=Not.ON as an additional argument.
the results of your overdispersion test:

chisq      ratio        rdf          p  
87.310363   0.285328 306.000000   1.000000

actually suggest underdispersion; the ratio is less than one and the p-value is 1.00

It's a very common (and understandable) confusion to think that the negative binomial is an extension of the binomial distribution that's appropriate for accounting for overdispersion. Actually, the negative binomial extends the Poisson distribution. The section on overdispersion in the GLMM FAQ suggests various methods for dealing with overdispersion in binomial models: observation-level random effects (= logit-Normal-binomial models), beta-binomial models (e.g. in glmmTMB), quasi-likelihood, ...
if you do end up having underdispersion, you can use quasi-likelihood (at present I don't know of an underdispersed binomial analogue); there is a new section in the GLMM FAQ on underdispersion

Solved – Residual pattern for mixed models (tried lmer and glmer)

According to the developers. It is almost impossible:

https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#calculating-scaled-residuals

Best Answer

Related Solutions

Solved – lme4: Problems trying to run negative binomial glmer.nb

Solved – Residual pattern for mixed models (tried lmer and glmer)

Related Question