Solved – Modelling count data where offset variable is 0 for some observations

count-datageneralized linear modeloffset

I'm trying to help a student of a colleague. The student observed and counted bird behaviour (number of calls) in an experimental setup. The number of calls attributable to a specific observed bird during each experiment couldn't be determined but counting the number of birds that contributed to the number of calls recorded was possible. Hence my initial suggestion was to include the number of birds as an offset term in a Poisson GLM model, hence we would be fitting the expected number of calls per bird.

The problem with this is that during many observation occasions no birds (and hence no calls) were observed. The software (R in this case) complains because $\log(0) = -\inf$ (R complains about y containing -Inf data but that is purely the result of offset(log(nbirds)) being -Inf).

I actually suspect we need a hurdle model (or similar) where we have a separate binomial model for "calls observed?" (or not) and a truncated count model for the number of calls (per bird) in situations where there were calls, where we include the offset term only in the count part of the model.

Having tried this using the pscl package in R, but I'm still getting the same error:

mod1 <- hurdle(NumberCallsCOPO ~ Condition * MoonVis +
               offset(log(NumberCOPO)) | 1, data = Data,
               dist = "poisson")

because the same R code (glm.fit is used internally by hurdle() to fit the count model part) is checking for -Inf even though I don't think it would affect the model fit for those observations. (Is that a correct assumption?)

I can get the model to fit by adding a small number to NumberCOPO (say 0.0001) but this is a fudge at best.

Would adding this small continuity correction be OK in practice? If not, what other approaches should we be considering when handling data where we might want to use an offset in a Poisson model where the offset variable can take the value 0? All of the examples I have come across are for situations where a 0 would not be possible for the offset variable.

Best Answer

So the response you want to model is "Number of calls per bird" and the troublesome lines are where you didn't observe any birds? Just drop those rows. They add no information to the thing you are trying to model.

Related Question