How can I interpret the main effects (coefficients for dummy-coded factor) in a Poisson regression?
Assume the following example:
treatment <- factor(rep(c(1, 2), c(43, 41)),
levels = c(1, 2),
labels = c("placebo", "treated"))
improved <- factor(rep(c(1, 2, 3, 1, 2, 3), c(29, 7, 7, 13, 7, 21)),
levels = c(1, 2, 3),
labels = c("none", "some", "marked"))
numberofdrugs <- rpois(84, 10) + 1
healthvalue <- rpois(84, 5)
y <- data.frame(healthvalue, numberofdrugs, treatment, improved)
test <- glm(healthvalue~numberofdrugs+treatment+improved, y, family=poisson)
summary(test)
The output is:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.88955 0.19243 9.819 <2e-16 ***
numberofdrugs -0.02303 0.01624 -1.418 0.156
treatmenttreated -0.01271 0.10861 -0.117 0.907 MAIN EFFECT
improvedsome -0.13541 0.14674 -0.923 0.356 MAIN EFFECT
improvedmarke -0.10839 0.12212 -0.888 0.375 MAIN EFFECT
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I know that the incident rate for numberofdrugs
is exp(-0.023)=0.977
. But how do I interpret the main effects for the dummy variables?
Best Answer
The exponentiated
numberofdrugs
coefficient is the multiplicative term to use for the goal of calculating the estimatedhealthvalue
whennumberofdrugs
increases by 1 unit. In the case of categorical (factor) variables, the exponentiated coefficient is the multiplicative term relative to the base (first factor) level for that variable (since R uses treatment contrasts by default). Theexp(Intercept)
is the baseline rate, and all other estimates would be relative to it.In your example the estimated
healthvalue
for someone with2
drugs,"placebo"
andimprovement=="none"
would be (using addition inside exp as the equivalent of multiplication):While someone on
4
drugs,"treated"
, and"some"
improvement would have an estimatedhealthvalue
ofADDENDUM: This is what it means to be "additive on the log scale". "Additive on the log-odds scale" was the phrase that my teacher, Barbara McKnight, used when emphasizing the need to use all applicable term values times their estimated coefficients when doing any kind of prediction. You add first all the coefficients (including the intercept term) times eachcovariate values and then exponentiate the resulting sum. The way to return coefficients from regression objects in R is generally to use the
coef()
extractor function (done with a different random realization below):So the calculation of the estimate for a subject with
4
drugs,"treated"
, with"some"
improvement would be:And the linear predictor for that case should be the sum of:
These principles should apply to any stats package that returns a table of coefficients to the user. The method and principles is more general than might appear from my use of R.
I'm copying selected clarifying comments since they 'disappear' in the default display:
A: The coefficients are the natural_logarithms of the ratios. – DWin
A2: No. If it were logistic regression they would be but in Poisson regression, where the LHS is number of events and the implicit denominator is the number at risk, then the exponentiated coefficients are "rate ratios" or "relative risks".