Question has been rewritten
I am trying to fit a glm to find out how the rate of events happens (counts/exposure) related to some covariates, with Poisson error.
Counts is the number of events happened and the covariates include age, sex, smoking status, policy class and exposure, where policy class has 6 levels: class 0, class 1, …, class 5
I was wondering how to:
-
keeping the intercept (if possible? if not please would you explain me why not). [Answered by Affine]
-
making the coefficients of polclass0, …, polclass5 sum to 0, I know somehow I need to use
contr.sum
, but it doesn't work (see below) [An example from other site, but I couldn't replicated it. -
I used the +0 in the formula to make the polclass0 "take over" the intercept. However, is it possible to have other categorial variable (e.g. sex, or some categorial variable that takes 3+ values) to have two all coefficients display?
Example:
set.seed(123)
sex <- as.factor(sample(c(0,1), 50, replace=T, prob=c(0.5, 0.5)))
smoker <- as.factor(sample(c(0,1), 50, replace=T, prob=c(0.3, 0.7)))
polclass <- as.factor(sample(0:5, 50, replace=T))
age <- sample(16:80, 50, replace=T)
count <- rpois(50, 2.5)
exposure <- rgamma(50, 100)
glm1 <- glm(count ~ polclass + sex + smoker + age + offset(log(exposure)) + 0 ,
family=poisson(link='log'), contrasts = list(polclass = contr.sum))
summary(glm1)
Call:
glm(formula = count ~ polclass + sex + smoker + age + offset(log(exposure)) +
0, family = poisson(link = "log"), contrasts = list(polclass = contr.sum))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3604 -0.5625 -0.1263 0.5246 1.5584
Coefficients:
Estimate Std. Error z value Pr(>|z|)
polclass0 -3.719037 0.423105 -8.790 <2e-16 ***
polclass1 -3.347628 0.328248 -10.198 <2e-16 ***
polclass2 -3.638992 0.368175 -9.884 <2e-16 ***
polclass3 -3.833882 0.426819 -8.982 <2e-16 ***
polclass4 -3.780837 0.399512 -9.464 <2e-16 ***
polclass5 -3.838007 0.373120 -10.286 <2e-16 ***
sex1 -0.057738 0.091936 -0.628 0.530
smoker1 -0.174328 0.118654 -1.469 0.142
age -0.001846 0.006362 -0.290 0.772
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 8764.667 on 50 degrees of freedom
Residual deviance: 40.865 on 41 degrees of freedom
AIC: 189
Number of Fisher Scoring iterations: 5
as you can see, the contr.sum doesn't work and I would like to know why and how to do that exactly?
Thanks in advance!
Best Answer
In short: The intercept takes on the reference level. Each Polclass in the output is the incremental gain over the reference level.
If you want to explore the differences between Polclass, you would use
polclass <- relevel(polclass, "1")
and then re-run GLM. You'll do this for each reference level.I re-ran the model with the different reference levels and then compiled the coefficients in the table above. All other coefficients (except the intercept) stay the same but each polclass changes at the different reference levels. This happens since the intercept takes into account the reference level average.
If you want to examine the impact of the different polclasses, you'd have to look at each model and make some sort of inference based on your understanding of the data.