Solved – How to interpret coefficients in a Poisson regression with interaction terms

generalized linear modelpoisson distributionr

This question is a prolongation of this question: How to interpret coefficients in a Poisson regression?

If we follow the (almost) exact same routine, but we add correlation between the variablese treatment and improved (just for the sake of my question, which is interpreting the output), we get:

treatment     <- factor(rep(c(1, 2), c(43, 41)), 
                        levels = c(1, 2),
                        labels = c("placebo", "treated"))
improved      <- factor(rep(c(1, 2, 3, 1, 2, 3), c(29, 7, 7, 13, 7, 21)),
                        levels = c(1, 2, 3),
                        labels = c("none", "some", "marked"))    
numberofdrugs <- rpois(84, 10) + 1    
healthvalue   <- rpois(84, 5)   
y             <- data.frame(healthvalue, numberofdrugs, treatment, improved)
test          <- glm(healthvalue~numberofdrugs+treatment+improved + treatment:improved, y, family=poisson)
summary(test)

Note the $\textbf{ treatment:improved}$ term I added inside the glm function.

Now, we get the following output:

    Call:
glm(formula = healthvalue ~ numberofdrugs + treatment + improved + 
    treatment:improved, family = poisson, data = y)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.9261  -0.8733  -0.0296   0.5473   2.3358  

Coefficients:
                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      1.553051   0.184229   8.430   <2e-16 ***
numberofdrugs                    0.004298   0.014242   0.302   0.7628    
treatmenttreated                 0.007399   0.149440   0.050   0.9605    
improvedsome                     0.358897   0.164891   2.177   0.0295 *  
improvedmarked                  -0.178360   0.203756  -0.875   0.3814    
treatmenttreated:improvedsome   -0.330336   0.265310  -1.245   0.2131    
treatmenttreated:improvedmarked  0.050617   0.260203   0.195   0.8458    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 97.805  on 83  degrees of freedom
Residual deviance: 89.276  on 77  degrees of freedom
AIC: 383.29

Number of Fisher Scoring iterations: 5

If we ignore what seems to be insignificant coefficients, I can ask my question:

I understand that, as in the original post, treatment=placebo and improved=none is the base level for those variables, and thus are set to zero. My question is, why does it not exist any interaction terms with the base lavels for treatment=placebo and improved=none?

I thought setting the base levels to zero was just a construct, and in my mind there should still exist correlation between them…(?)

Best Answer

You write

I understand that, as in the original post, treatment=placebo and improved=none is the base level for those variables, and thus are set to zero. My question is, why does it not exist any interaction terms with the base levels for treatment=placebo and improved=none?

Because they are set to 0 and 0 multiplied by anything is still 0, and that's what dummy coding does.

Think about why treatment = placebo does not show up: It's set to 0 to allow the other levels of treatment to be compared to it.

Same with treatment = placebo in interactions: treatment = placebo, improved = some is set to 0 to allow it to be compared to treatment = treatment, improved = some.

There are other parameterizations of categorical variables that do not do this, exactly. Personally, I find those harder to interpret, but you can look at Helmert coding or effect coding for example.