R – How to Fit a GLM with Sum to Zero Constraints Without Reference Level

Question has been rewritten

I am trying to fit a glm to find out how the rate of events happens (counts/exposure) related to some covariates, with Poisson error.
Counts is the number of events happened and the covariates include age, sex, smoking status, policy class and exposure, where policy class has 6 levels: class 0, class 1, …, class 5

I was wondering how to:

keeping the intercept (if possible? if not please would you explain me why not). [Answered by Affine]
making the coefficients of polclass0, …, polclass5 sum to 0, I know somehow I need to use contr.sum, but it doesn't work (see below) [An example from other site, but I couldn't replicated it.
I used the +0 in the formula to make the polclass0 "take over" the intercept. However, is it possible to have other categorial variable (e.g. sex, or some categorial variable that takes 3+ values) to have two all coefficients display?

Example:

    set.seed(123)
    sex <- as.factor(sample(c(0,1), 50, replace=T, prob=c(0.5, 0.5)))
    smoker <- as.factor(sample(c(0,1), 50, replace=T, prob=c(0.3, 0.7)))
    polclass <- as.factor(sample(0:5, 50, replace=T))
    age <- sample(16:80, 50, replace=T)
    count <- rpois(50, 2.5)
    exposure <- rgamma(50, 100)

    glm1 <- glm(count ~ polclass + sex + smoker + age + offset(log(exposure)) + 0 ,
                family=poisson(link='log'), contrasts = list(polclass = contr.sum))

    summary(glm1)


        Call:
        glm(formula = count ~ polclass + sex + smoker + age + offset(log(exposure)) + 
            0, family = poisson(link = "log"), contrasts = list(polclass = contr.sum))

        Deviance Residuals: 
            Min       1Q   Median       3Q      Max  
        -2.3604  -0.5625  -0.1263   0.5246   1.5584  

        Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
        polclass0 -3.719037   0.423105  -8.790   <2e-16 ***
        polclass1 -3.347628   0.328248 -10.198   <2e-16 ***
        polclass2 -3.638992   0.368175  -9.884   <2e-16 ***
        polclass3 -3.833882   0.426819  -8.982   <2e-16 ***
        polclass4 -3.780837   0.399512  -9.464   <2e-16 ***
        polclass5 -3.838007   0.373120 -10.286   <2e-16 ***
        sex1      -0.057738   0.091936  -0.628    0.530    
        smoker1   -0.174328   0.118654  -1.469    0.142    
        age       -0.001846   0.006362  -0.290    0.772    
        ---
        Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

        (Dispersion parameter for poisson family taken to be 1)

            Null deviance: 8764.667  on 50  degrees of freedom
        Residual deviance:   40.865  on 41  degrees of freedom
        AIC: 189

        Number of Fisher Scoring iterations: 5

as you can see, the contr.sum doesn't work and I would like to know why and how to do that exactly?

Thanks in advance!

Ref0 Ref1 Ref2 Ref3 Ref4 Ref5 (Intercept) -3.95110 -3.57969 -3.87106 -4.06595 -4.01290 -4.07007 sex1 0.11548 0.11548 0.11548 0.11548 0.11548 0.11548 smoker1 0.34866 0.34866 0.34866 0.34866 0.34866 0.34866 polclass0 NA -0.37141 -0.08004 0.11485 0.06180 0.11897 polclass1 0.37141 NA 0.29136 0.48625 0.43321 0.49038 polclass2 0.08004 -0.29136 NA 0.19489 0.14185 0.19902 polclass3 -0.11485 -0.48625 -0.19489 NA -0.05304 0.00413 polclass4 -0.06180 -0.43321 -0.14185 0.05304 NA 0.05717 polclass5 -0.11897 -0.49038 -0.19902 -0.00413 -0.05717 NA age -0.00185 -0.00185 -0.00185 -0.00185 -0.00185 -0.00185

Best Answer

In short: The intercept takes on the reference level. Each Polclass in the output is the incremental gain over the reference level.

If you want to explore the differences between Polclass, you would use polclass <- relevel(polclass, "1") and then re-run GLM. You'll do this for each reference level.

I re-ran the model with the different reference levels and then compiled the coefficients in the table above. All other coefficients (except the intercept) stay the same but each polclass changes at the different reference levels. This happens since the intercept takes into account the reference level average.

If you want to examine the impact of the different polclasses, you'd have to look at each model and make some sort of inference based on your understanding of the data.

Best Answer

Related Solutions

Solved – Identical coefficients estimated in Poisson vs Quasi-Poisson model

Solved – GLM – Zero-Inflated distribution

Related Question