I am trying to do a Poisson regression using the following data, where infant deaths are shown per year for both North and South England.
ageband agecat midage year deaths population Divide Gender percentage
2965 <01 <01 0.5 1965 5033 214400 North 1 2.3474813
989 <01 <01 0.5 1965 3952 199000 South 1 1.9859296
2984 <01 <01 0.5 1966 4999 210900 North 1 2.3703177
1008 <01 <01 0.5 1966 3850 196900 South 1 1.9553073
3003 <01 <01 0.5 1967 4663 208700 North 1 2.2343076
1027 <01 <01 0.5 1967 3525 194200 South 1 1.8151390
3022 <01 <01 0.5 1968 4603 204400 North 1 2.2519569
1046 <01 <01 0.5 1968 3616 188400 South 1 1.9193206
3041 <01 <01 0.5 1969 4507 204100 North 1 2.2082313
This is what I am running in R: (nsmaleMerge is my data), am I correctly using the offset parameter or should I not be enclosing it in a log function?
poissonM <- glm(deaths~Divide, nsmaleMerge, offset(log(population)), family = poisson(link = "log"))
Deaths is the count variable as seen from the data, and divide (north/south) is the covariate, exposure would be population.
When I try doing 'offset = population' without the log function I get an error about not including start values, but when I do it with the log function as seen above it works fine and this is the output:
Call:
glm(formula = deaths ~ Divide, family = poisson(link = "log"),
data = nsmaleMerge, weights = offset(log(population)))
Deviance Residuals:
Min 1Q Median 3Q Max
-108.82 -77.21 -28.82 32.27 203.75
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.5750418 0.0009075 8347 <2e-16 ***
DivideSouth -0.1843193 0.0013457 -137 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 808081 on 103 degrees of freedom
Residual deviance: 789242 on 102 degrees of freedom
AIC: 800662
Number of Fisher Scoring iterations: 5
Do I need to do the log function, or do I have an error in my data when trying to use population by itself under offset?
Best Answer
It is a programming error.
The fourth argument of
glm
isweights
, not offset. So either use named arguments or add the offset to the formula like+ offset(log(population))
.