Solved – Should I use Poisson distribution for non-integer, count-like data

glmmlme4-nlmemodelingpoisson distributionr

It's my first question here, I hope I'll ask it correctly. I am trying to find out how to analyse non-integer, count data (yes!). I am looking at the effect of a given treatment on habitat suitability for some birds, measured as number of territories. Some of the territories are inbetween two plots with different treatments, such that I had to distribute the territories between the plots. I end up with half and quarter territories.

EDIT My dataset looks like this:

   year         plot    treatment   territories    location surface
1  1985         1569         ctrl           1.0     Cheyres     1.2
2  1986         1569         ctrl           1.0     Cheyres     1.2
3  1987         1569            1           0.0     Cheyres     1.2 
4  1988         1569            2           2.0     Cheyres     1.2
5  1989         1569            3           6.5     Cheyres     1.2
6  1990         1569            1           1.5     Cheyres     1.2

Where year, plot, location and treatment are factors.

I've tried a GLMM with Poisson distribution (in R):

glmmacrsci1 <- glmer(territories ~ treatment * (1|year) * (1|location/plot), 
                     offset=surface, family="poisson", data=acrsci)

When running this, I get the usual non-integer warnings (e.g.):

In dpois(y, mu, log = TRUE) : non-integer x = 1.500000

and I get infinite AIC, BIC, and deviance:

$AICtab
 AIC      BIC   logLik deviance df.resid 
 Inf      Inf     -Inf      Inf      775

Most other questions related to non-integer counts were about rates, which can apparently be circumvented by using an offset. However I don't think it's possible in my case.

My questions to you:

1) Is it correct to use a GLMM with Poisson distribution with such data? (I don't think so but glmer seems to work anyway)

2) Can you think of any alternative to Poisson for my data?

Best Answer

1) Is it correct to use a GLMM with Poisson distribution with such data? (I don't think so but glmer seems to work anyway)

No, it is not correct. By "count data" we generally mean data that records number of cases, so it can be only non-negative and integer-valued. The same is with Poisson distribution, that is a distribution for non-negative integer-valued data. Under Poisson distribution probability of observing non-integer value is zero and R behaves accordingly to it:

dpois(c(1, 1.5, 2, 2.5, 3), 5)
## [1] 0.03368973 0.00000000 0.08422434 0.00000000 0.14037390
## Warning messages:
## 1: In dpois(c(1, 1.5, 2, 2.5, 3), 5) : non-integer x = 1.500000
## 2: In dpois(c(1, 1.5, 2, 2.5, 3), 5) : non-integer x = 2.500000

You can estimate log-linear glmm using this data but assuming Poisson distribution means that you treat all the non-integers as improbable values so R throws appropriate warnings. This means that the estimates of log-likelihood and the ones based on it, like AIC, won't be what you want them to be.

This doesn't mean that you cannot estimate log-linear regression with non-integer data. You can, but you can't assume Poisson distribution for such data.

See also What regression model is the most appropriate to use with count data? thread (check also the discussion in comments below the answer) and How does a Poisson distribution work when modeling continuous data and does it result in information loss? .

Best Answer

Related Solutions

Solved – How to account for a lack of fit using a quasi-poisson on non-integer, overdispersed data

Solved – Use of Gamma Distribution for count data

Related Question