Solved – What type of data is needed for offset in a Poisson regression model – R

offsetpoisson-regressionpopulationrregression

I am trying to do a Poisson regression using the following data, where infant deaths are shown per year for both North and South England.

ageband agecat midage year deaths population Divide Gender percentage
2965    <01 <01 0.5 1965    5033    214400  North   1   2.3474813
989     <01 <01 0.5 1965    3952    199000  South   1   1.9859296
2984    <01 <01 0.5 1966    4999    210900  North   1   2.3703177
1008    <01 <01 0.5 1966    3850    196900  South   1   1.9553073
3003    <01 <01 0.5 1967    4663    208700  North   1   2.2343076
1027    <01 <01 0.5 1967    3525    194200  South   1   1.8151390
3022    <01 <01 0.5 1968    4603    204400  North   1   2.2519569
1046    <01 <01 0.5 1968    3616    188400  South   1   1.9193206
3041    <01 <01 0.5 1969    4507    204100  North   1   2.2082313

This is what I am running in R: (nsmaleMerge is my data), am I correctly using the offset parameter or should I not be enclosing it in a log function?

poissonM <- glm(deaths~Divide, nsmaleMerge, offset(log(population)), family = poisson(link = "log"))

Deaths is the count variable as seen from the data, and divide (north/south) is the covariate, exposure would be population.

When I try doing 'offset = population' without the log function I get an error about not including start values, but when I do it with the log function as seen above it works fine and this is the output:

    Call:
glm(formula = deaths ~ Divide, family = poisson(link = "log"), 
    data = nsmaleMerge, weights = offset(log(population)))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-108.82   -77.21   -28.82    32.27   203.75  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.5750418  0.0009075    8347   <2e-16 ***
DivideSouth -0.1843193  0.0013457    -137   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 808081  on 103  degrees of freedom
Residual deviance: 789242  on 102  degrees of freedom
AIC: 800662

Number of Fisher Scoring iterations: 5

Do I need to do the log function, or do I have an error in my data when trying to use population by itself under offset?

Best Answer

It is a programming error.

The fourth argument of glm is weights, not offset. So either use named arguments or add the offset to the formula like + offset(log(population)).

Related Question