Poisson GLM in R – Issues with Rates vs. Counts

generalized linear modelpoisson distributionr

I'm currently working on a project involving GLMs (and eventually GAMs) of some count data over time. Normally I'd do this in SAS, but I'm trying to move to R, and having…issues.

When I fit a GLM to count data using the following:

cdi_model <- glm(counts ~ exposure + covariate + month, data=test, family = poisson)

I get:

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9825  -0.7903  -0.1187   0.5717   1.7649  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.97563    0.20117   9.821  < 2e-16 ***
exposure     0.94528    0.30808   3.068  0.00215 ** 
covariate   -0.01317    0.28044  -0.047  0.96254    
months      -0.03203    0.01303  -2.458  0.01398 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 40.219  on 29  degrees of freedom
Residual deviance: 29.297  on 26  degrees of freedom
AIC: 137.7

Number of Fisher Scoring iterations: 5

Ignore for a moment the performance, or lack thereof of the model itself – mostly playing with syntax and the like at this point.

However, when I try to fit rate data (counts/person-days) and use an offset like so:
cdi_model <- glm(count_rate ~ exposure + covariate + months + offset(log(pd)), data=test, family = poisson)

I get 50+ warnings, all "1: In dpois(y, mu, log = TRUE) : non-integer x = 0.002082" etc. That is more than one for each observation (there's only 30 in the data set).

Additionally, the model fit seems to go to pot. Output as follows:

 Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-0.0273656  -0.0122169   0.0002396   0.0072269   0.0258643  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept) -15.40110   15.12772  -1.018    0.309
exposure      0.84848   22.18012   0.038    0.969
covariate    -0.02751   21.31262  -0.001    0.999
months       -0.01889    0.95977  -0.020    0.984

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 0.0068690  on 29  degrees of freedom
Residual deviance: 0.0054338  on 26  degrees of freedom
AIC: Inf

Number of Fisher Scoring iterations: 9

Despite this, if I plot the predicted rate against the actual data, the fit doesn't look that much worse, and the actual effect estimate doesn't seem to change all that much.

Anyone have an idea what's going on – or if everything's going right and I'm missing something due to inexperience?

Best Answer

When you add the offset you don't need to (and shouldn't) also compute the rate and include the exposure.

I don't know whether this is the cause of the errors, but if the exposure per case is person days pd, then the dependent variable should be counts and the offset should be log(pd), like this:

cdi_model <- glm(counts ~ covariate + months + offset(log(pd)), 
                 data=test, family = poisson)