I'm currently working on a project involving GLMs (and eventually GAMs) of some count data over time. Normally I'd do this in SAS, but I'm trying to move to R, and having…issues.
When I fit a GLM to count data using the following:
cdi_model <- glm(counts ~ exposure + covariate + month, data=test, family = poisson)
I get:
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9825 -0.7903 -0.1187 0.5717 1.7649
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.97563 0.20117 9.821 < 2e-16 ***
exposure 0.94528 0.30808 3.068 0.00215 **
covariate -0.01317 0.28044 -0.047 0.96254
months -0.03203 0.01303 -2.458 0.01398 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 40.219 on 29 degrees of freedom
Residual deviance: 29.297 on 26 degrees of freedom
AIC: 137.7
Number of Fisher Scoring iterations: 5
Ignore for a moment the performance, or lack thereof of the model itself – mostly playing with syntax and the like at this point.
However, when I try to fit rate data (counts/person-days) and use an offset like so:
cdi_model <- glm(count_rate ~ exposure + covariate + months + offset(log(pd)), data=test, family = poisson)
I get 50+ warnings, all "1: In dpois(y, mu, log = TRUE) : non-integer x = 0.002082" etc. That is more than one for each observation (there's only 30 in the data set).
Additionally, the model fit seems to go to pot. Output as follows:
Deviance Residuals:
Min 1Q Median 3Q Max
-0.0273656 -0.0122169 0.0002396 0.0072269 0.0258643
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -15.40110 15.12772 -1.018 0.309
exposure 0.84848 22.18012 0.038 0.969
covariate -0.02751 21.31262 -0.001 0.999
months -0.01889 0.95977 -0.020 0.984
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 0.0068690 on 29 degrees of freedom
Residual deviance: 0.0054338 on 26 degrees of freedom
AIC: Inf
Number of Fisher Scoring iterations: 9
Despite this, if I plot the predicted rate against the actual data, the fit doesn't look that much worse, and the actual effect estimate doesn't seem to change all that much.
Anyone have an idea what's going on – or if everything's going right and I'm missing something due to inexperience?
Best Answer
When you add the offset you don't need to (and shouldn't) also compute the rate and include the exposure.
I don't know whether this is the cause of the errors, but if the exposure per case is person days
pd
, then the dependent variable should becounts
and the offset should belog(pd)
, like this: