Poisson Regression – Interaction Terms in Poisson Regression

interactionpoisson-regression

I'm confused about the interpretation of interaction terms in poisson regressions. Here's a hypothetical dataset

Id  Group  Condition   Error rate
1   A      X           1
1   A      Y           2
2   B      X           3
2   B      Y           6

Imagine that individuals 1 and 2 are duplicated a number times (with random noise) to make two groups. If we analyse this data using a poisson regression (as it's count data) there will be no Group by Condition interaction (in both groups, error rates increase by 100% in condition Y, i.e. an Incidence Rate Ratio of 2). If we analyse this data using an ANOVA (and don't transform data) it is likely to yield a significant Group by Condition interaction (I've simulated this). While the poisson regression looks at proportional differences across conditions, the ANOVA is more sensitive to absolute differences.

It seems to me that because the poisson regression uses a multiplicative scale (via log transformation), an additive effect under the poisson regression will become multiplicative when the same data is analysed using OLS (or ANOVA).

So which interaction term is more "truthful"? Is it just a matter of interpretation? Or are there any a priori reasons for choosing one over the other. For example, could you argue that because count data exhibits ratio properties we should be analysing ratios and not intervals (according to Stevens)?

Thanks and hope this makes sense.

Best Answer

Interaction terms in Poisson regression models are interpreted as a ratio of ratios of rates. With an interaction term, your model's interpretation of that parameter would be, "a rate ratio comparing condition Y to X among individuals of type 2 relative to rate ratio comparing condition Y to X among individuals of type 1".

OLS and Poisson regression in this case will give the same fitted means in models adjusting for interaction since you have a fully specified log-linear model for the table of fitted means. Having a different working model for the distribution of the data will lead to different inference of course (so OLS and Poisson GLMs will have different P-values across the board). However, the fitted means for the models without interaction will be different between OLS and Poisson. This is because the model is not fully specified and the difference between mean rates is taken to be constant in the model.

\begin{equation} \begin{array}{c} \log(\lambda_{ij}) = \beta_0 + \beta_1 i + \beta_2 j \qquad (\mbox{Poisson})\\ \mu_{ij} = \gamma_0 + \gamma_1 i + \gamma_2 j \qquad (\mbox{OLS}) \end{array} \end{equation}

Taking $i = 0, 1$ to denote individual membership and $j=0 ,1$ to denote condition,

Poisson \begin{equation} \begin{array}{ccc} & j=0 & j=1 \\ i=0 & \lambda_{11} = \exp(\beta_0) & \lambda_{12} = \exp(\beta_0 + \beta_2) \\ i=1 & \lambda_{21} = \exp(\beta_0 + \beta_1) & \lambda_{22} =\exp(\beta_0 + \beta_1 + \beta_2) \\ \end{array} \end{equation}

OLS \begin{equation} \begin{array}{ccc} & j=0 & j=1 \\ i=0 & \mu_{11} = \gamma_0 & \mu_{12} = \gamma_0 + \gamma_2 \\ i=1 & \mu_{21} = \gamma_0 + \gamma_1 & \mu_{22} =\gamma_0 + \gamma_1 + \gamma_2 \\ \end{array} \end{equation}

Related Solutions

Solved – Why do ANOVAs and GLM (negative binomial model) give different results for interaction effects

If you only have two categorical variables and you include the interaction between the two, then the nbreg and anova are equivalent in the sense that they produce exactly the same predicted values. As a consequence both models are "equally right" or "equally wrong".

sysuse nlsw88, clear

gen byte black = race == 2 if race < 3

anova wage black##union
predict yhat_anova

nbreg wage black##union
predict yhat_nbreg

tab yhat_anova yhat_nbreg

    Fitted |         Predicted number of events
    values |  5.968046     7.5821   8.614504   8.735929 |     Total
-----------+--------------------------------------------+----------
  5.968046 |       350          0          0          0 |       350 
    7.5821 |         0      1,051          0          0 |     1,051 
  8.614504 |         0          0        151          0 |       151 
  8.735929 |         0          0          0        302 |       302 
-----------+--------------------------------------------+----------
     Total |       350      1,051        151        302 |     1,854

This is not just true for a model with two categorical variables and their interaction; It is true for any fully saturated model, i.e. any model that inlcudes only categorical variables and all interactions including all higher order interactions.

Edit:

In order to see why these models are just different ways of saying the same thing it is easiest to go back at the basics. Consider the example above: Both models model the mean outcome, in this case mean wage. In both models there are only 4 means to model: black non-union, black union, white non-union white union. Both models model these 4 means with 4 parameters (constant, main effect for union, main effect for black, and the interaction term). As a consequence both models impose no constraint on these means.

Lets start with just looking at these means:

table  union black, c(mean wage)

------------------------------
union     |       black       
worker    |        0         1
----------+-------------------
 nonunion |   7.5821  5.968046
    union | 8.735929  8.614504
------------------------------

So, among non-union member there is a noticable difference in mean wage between black and white persons, while among union members this difference is a lot smaller.

Lets look at the regression output:

. reg wage i.union##i.black

      Source |       SS       df       MS              Number of obs =    1854
-------------+------------------------------           F(  3,  1850) =   29.83
       Model |  1472.83336     3  490.944452           Prob > F      =  0.0000
    Residual |  30445.6086  1850  16.4570858           R-squared     =  0.0461
-------------+------------------------------           Adj R-squared =  0.0446
       Total |   31918.442  1853   17.225279           Root MSE      =  4.0567

------------------------------------------------------------------------------
        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval] 
-------------+----------------------------------------------------------------
       union |
      union  |   1.153829   .2648625     4.36   0.000     .6343677    1.673289
     1.black |  -1.614053   .2503572    -6.45   0.000    -2.105066   -1.123041
             |
 union#black |
    union#1  |   1.492629   .4755625     3.14   0.002     .5599329    2.425324
             |
       _cons |     7.5821   .1251339    60.59   0.000     7.336681    7.827518
------------------------------------------------------------------------------

You can see that the mean wage for non-union white persons is the constant. This mean is 1.61 dollars/hour less for black persons. So the wage for black persons is:

di _b[_cons]+_b[1.black]
5.9680463

Which corresponds with the mean reported in the tabel above. The negative effect of being black is reduced by 1.48 dollars/hour, i.e. black union members earn

di _b[1.black] + _b[1.union#1.black]
-.12142489

dollars/hour less than white union members, a number you can also recover from the table of means above.

Consider a multiplicative model, in this case a Poisson with robust standard errors (for a justification of that choice, see here.

. poisson wage i.union##i.black, irr vce(robust)
note: you are responsible for interpretation of noncount dep. variable

Iteration 0:   log pseudolikelihood = -5239.2159  
Iteration 1:   log pseudolikelihood = -5239.2159  

Poisson regression                                Number of obs   =       1854
                                                  Wald chi2(3)    =     106.56
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -5239.2159                 Pseudo R2       =     0.0188

------------------------------------------------------------------------------
             |               Robust
        wage |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       union |
      union  |   1.152178   .0360531     4.53   0.000     1.083638    1.225053
     1.black |   .7871232   .0265286    -7.10   0.000     .7368083     .840874
             |
 union#black |
    union#1  |   1.252791   .0759936     3.72   0.000     1.112359    1.410951
             |
       _cons |     7.5821   .1308473   117.39   0.000     7.329932    7.842942
------------------------------------------------------------------------------

Again we can see that the mean wage of white non-union members is 7.5821. Black non-union persons earn $(0.787 - 1) \times 100\% = -21\%$ less then white non-union persons, a number you can recover from the table of means: $( 5.968046 - 7.5821 ) / 7.5821*100 = -21\%$. This negative effect of being black increases, which means becomes less negative, by 25% if someone is a union member. So, the effect of being black for union members is $1.25\times.787=.98$ or black union members earn 2\% less than white union members. A number you could recover from the table of means or the regression output, both of which I will leave as an exercise to the reader.

Solved – Modelling mortality rates using Poisson regression

Without seeing the dataset (not available) it seems mostly correct. The nice thing about Poisson regressions is that they can provide rates when used as suggested. One thing that may be worth to keep in mind is that there may be overdispersion where you should switch to a negative binomial regression (see the MASS package).

The Poisson regression doesn't care whether the data as aggregated or not, but in practice non-aggregated data is frail and can cause some unexpected errors. Note that you cannot have surv == 0 for any of the cases. When I've tested the estimates are the same:

set.seed(1)
n <- 1500
data <- 
  data.frame(
    dead = sample(0:1, n, replace = TRUE, prob = c(.9, .1)),
    surv = ceiling(exp(runif(100))*365),
    gender = sample(c("Male", "Female"), n, replace = TRUE),
    diagnosis = sample(0:1, n, replace = TRUE),
    age = sample(60:80, n, replace = TRUE),
    inclusion_year = sample(1998:2011, n, replace = TRUE)
  )

library(dplyr)
model <- 
  data %>% 
  group_by(gender, 
           diagnosis,
           age,
           inclusion_year) %>% 
  summarise(Deaths = sum(dead),
            Person_time = sum(surv)) %>%
  glm(Deaths ~ gender + diagnosis + I(age - 70) + I(inclusion_year - 1998) + offset(log(Person_time/10^3/365.25)), 
      data = . , family = poisson)

alt_model <- glm(dead ~ gender + diagnosis + I(age - 70) + I(inclusion_year - 1998) + offset(log(surv/10^3/365.25)), 
    data = data , family = poisson)
sum(coef(alt_model) - coef(model))
# > 1.779132e-14
sum(abs(confint(alt_model) - confint(model)))
# > 6.013114e-11

As you get a rate it is important to center the variables so that the intercept is interpretable, e.g.:

> exp(coef(model)["(Intercept)"])
(Intercept) 
    51.3771

Can be interpreted as the base rate and then the covariates are rate ratios. If we want the base rate after 10 years:

> exp(coef(model)["(Intercept)"] + coef(model)["I(inclusion_year - 1998)"]*10)
(Intercept) 
     47.427

I've currently modeled the inclusion year as a trend variable but you should probably check for nonlinearities and sometimes it is useful to do a categorization of the time points. I used this approach in this article:

D. Gordon, P. Gillgren, S. Eloranta, H. Olsson, M. Gordon, J. Hansson, and K. E. Smedby, “Time trends in incidence of cutaneous melanoma by detailed anatomical location and patterns of ultraviolet radiation exposure: a retrospective population-based study,” Melanoma Res., vol. 25, no. 4, pp. 348–356, Aug. 2015.

Best Answer

Related Solutions

Solved – Why do ANOVAs and GLM (negative binomial model) give different results for interaction effects

Solved – Modelling mortality rates using Poisson regression

Related Question