This question is a prolongation of this question: How to interpret coefficients in a Poisson regression?
If we follow the (almost) exact same routine, but we add correlation between the variablese treatment and improved (just for the sake of my question, which is interpreting the output), we get:
treatment <- factor(rep(c(1, 2), c(43, 41)),
levels = c(1, 2),
labels = c("placebo", "treated"))
improved <- factor(rep(c(1, 2, 3, 1, 2, 3), c(29, 7, 7, 13, 7, 21)),
levels = c(1, 2, 3),
labels = c("none", "some", "marked"))
numberofdrugs <- rpois(84, 10) + 1
healthvalue <- rpois(84, 5)
y <- data.frame(healthvalue, numberofdrugs, treatment, improved)
test <- glm(healthvalue~numberofdrugs+treatment+improved + treatment:improved, y, family=poisson)
summary(test)
Note the $\textbf{ treatment:improved}$ term I added inside the glm function.
Now, we get the following output:
Call:
glm(formula = healthvalue ~ numberofdrugs + treatment + improved +
treatment:improved, family = poisson, data = y)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.9261 -0.8733 -0.0296 0.5473 2.3358
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.553051 0.184229 8.430 <2e-16 ***
numberofdrugs 0.004298 0.014242 0.302 0.7628
treatmenttreated 0.007399 0.149440 0.050 0.9605
improvedsome 0.358897 0.164891 2.177 0.0295 *
improvedmarked -0.178360 0.203756 -0.875 0.3814
treatmenttreated:improvedsome -0.330336 0.265310 -1.245 0.2131
treatmenttreated:improvedmarked 0.050617 0.260203 0.195 0.8458
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 97.805 on 83 degrees of freedom
Residual deviance: 89.276 on 77 degrees of freedom
AIC: 383.29
Number of Fisher Scoring iterations: 5
If we ignore what seems to be insignificant coefficients, I can ask my question:
I understand that, as in the original post, treatment=placebo and improved=none is the base level for those variables, and thus are set to zero. My question is, why does it not exist any interaction terms with the base lavels for treatment=placebo and improved=none?
I thought setting the base levels to zero was just a construct, and in my mind there should still exist correlation between them…(?)
Best Answer
You write
Because they are set to 0 and 0 multiplied by anything is still 0, and that's what dummy coding does.
Think about why treatment = placebo does not show up: It's set to 0 to allow the other levels of treatment to be compared to it.
Same with treatment = placebo in interactions: treatment = placebo, improved = some is set to 0 to allow it to be compared to treatment = treatment, improved = some.
There are other parameterizations of categorical variables that do not do this, exactly. Personally, I find those harder to interpret, but you can look at Helmert coding or effect coding for example.