If you only have two categorical variables and you include the interaction between the two, then the nbreg
and anova
are equivalent in the sense that they produce exactly the same predicted values. As a consequence both models are "equally right" or "equally wrong".
sysuse nlsw88, clear
gen byte black = race == 2 if race < 3
anova wage black##union
predict yhat_anova
nbreg wage black##union
predict yhat_nbreg
tab yhat_anova yhat_nbreg
Fitted | Predicted number of events
values | 5.968046 7.5821 8.614504 8.735929 | Total
-----------+--------------------------------------------+----------
5.968046 | 350 0 0 0 | 350
7.5821 | 0 1,051 0 0 | 1,051
8.614504 | 0 0 151 0 | 151
8.735929 | 0 0 0 302 | 302
-----------+--------------------------------------------+----------
Total | 350 1,051 151 302 | 1,854
This is not just true for a model with two categorical variables and their interaction; It is true for any fully saturated model, i.e. any model that inlcudes only categorical variables and all interactions including all higher order interactions.
Edit:
In order to see why these models are just different ways of saying the same thing it is easiest to go back at the basics. Consider the example above: Both models model the mean outcome, in this case mean wage. In both models there are only 4 means to model: black non-union, black union, white non-union white union. Both models model these 4 means with 4 parameters (constant, main effect for union, main effect for black, and the interaction term). As a consequence both models impose no constraint on these means.
Lets start with just looking at these means:
table union black, c(mean wage)
------------------------------
union | black
worker | 0 1
----------+-------------------
nonunion | 7.5821 5.968046
union | 8.735929 8.614504
------------------------------
So, among non-union member there is a noticable difference in mean wage between black and white persons, while among union members this difference is a lot smaller.
Lets look at the regression output:
. reg wage i.union##i.black
Source | SS df MS Number of obs = 1854
-------------+------------------------------ F( 3, 1850) = 29.83
Model | 1472.83336 3 490.944452 Prob > F = 0.0000
Residual | 30445.6086 1850 16.4570858 R-squared = 0.0461
-------------+------------------------------ Adj R-squared = 0.0446
Total | 31918.442 1853 17.225279 Root MSE = 4.0567
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
union |
union | 1.153829 .2648625 4.36 0.000 .6343677 1.673289
1.black | -1.614053 .2503572 -6.45 0.000 -2.105066 -1.123041
|
union#black |
union#1 | 1.492629 .4755625 3.14 0.002 .5599329 2.425324
|
_cons | 7.5821 .1251339 60.59 0.000 7.336681 7.827518
------------------------------------------------------------------------------
You can see that the mean wage for non-union white persons is the constant. This mean is 1.61 dollars/hour less for black persons. So the wage for black persons is:
di _b[_cons]+_b[1.black]
5.9680463
Which corresponds with the mean reported in the tabel above. The negative effect of being black is reduced by 1.48 dollars/hour, i.e. black union members earn
di _b[1.black] + _b[1.union#1.black]
-.12142489
dollars/hour less than white union members, a number you can also recover from the table of means above.
Consider a multiplicative model, in this case a Poisson with robust standard errors (for a justification of that choice, see here.
. poisson wage i.union##i.black, irr vce(robust)
note: you are responsible for interpretation of noncount dep. variable
Iteration 0: log pseudolikelihood = -5239.2159
Iteration 1: log pseudolikelihood = -5239.2159
Poisson regression Number of obs = 1854
Wald chi2(3) = 106.56
Prob > chi2 = 0.0000
Log pseudolikelihood = -5239.2159 Pseudo R2 = 0.0188
------------------------------------------------------------------------------
| Robust
wage | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
union |
union | 1.152178 .0360531 4.53 0.000 1.083638 1.225053
1.black | .7871232 .0265286 -7.10 0.000 .7368083 .840874
|
union#black |
union#1 | 1.252791 .0759936 3.72 0.000 1.112359 1.410951
|
_cons | 7.5821 .1308473 117.39 0.000 7.329932 7.842942
------------------------------------------------------------------------------
Again we can see that the mean wage of white non-union members is 7.5821. Black non-union persons earn $(0.787 - 1) \times 100\% = -21\%$ less then white non-union persons, a number you can recover from the table of means: $( 5.968046 - 7.5821 ) / 7.5821*100 = -21\%$. This negative effect of being black increases, which means becomes less negative, by 25% if someone is a union member. So, the effect of being black for union members is $1.25\times.787=.98$ or black union members earn 2\% less than white union members. A number you could recover from the table of means or the regression output, both of which I will leave as an exercise to the reader.
Without seeing the dataset (not available) it seems mostly correct. The nice thing about Poisson regressions is that they can provide rates when used as suggested. One thing that may be worth to keep in mind is that there may be overdispersion where you should switch to a negative binomial regression (see the MASS package).
The Poisson regression doesn't care whether the data as aggregated or not, but in practice non-aggregated data is frail and can cause some unexpected errors. Note that you cannot have surv == 0
for any of the cases. When I've tested the estimates are the same:
set.seed(1)
n <- 1500
data <-
data.frame(
dead = sample(0:1, n, replace = TRUE, prob = c(.9, .1)),
surv = ceiling(exp(runif(100))*365),
gender = sample(c("Male", "Female"), n, replace = TRUE),
diagnosis = sample(0:1, n, replace = TRUE),
age = sample(60:80, n, replace = TRUE),
inclusion_year = sample(1998:2011, n, replace = TRUE)
)
library(dplyr)
model <-
data %>%
group_by(gender,
diagnosis,
age,
inclusion_year) %>%
summarise(Deaths = sum(dead),
Person_time = sum(surv)) %>%
glm(Deaths ~ gender + diagnosis + I(age - 70) + I(inclusion_year - 1998) + offset(log(Person_time/10^3/365.25)),
data = . , family = poisson)
alt_model <- glm(dead ~ gender + diagnosis + I(age - 70) + I(inclusion_year - 1998) + offset(log(surv/10^3/365.25)),
data = data , family = poisson)
sum(coef(alt_model) - coef(model))
# > 1.779132e-14
sum(abs(confint(alt_model) - confint(model)))
# > 6.013114e-11
As you get a rate it is important to center the variables so that the intercept is interpretable, e.g.:
> exp(coef(model)["(Intercept)"])
(Intercept)
51.3771
Can be interpreted as the base rate and then the covariates are rate ratios. If we want the base rate after 10 years:
> exp(coef(model)["(Intercept)"] + coef(model)["I(inclusion_year - 1998)"]*10)
(Intercept)
47.427
I've currently modeled the inclusion year as a trend variable but you should probably check for nonlinearities and sometimes it is useful to do a categorization of the time points. I used this approach in this article:
D. Gordon, P. Gillgren, S. Eloranta, H. Olsson, M. Gordon, J. Hansson, and K. E. Smedby, “Time trends in incidence of cutaneous melanoma by detailed anatomical location and patterns of ultraviolet radiation exposure: a retrospective population-based study,” Melanoma Res., vol. 25, no. 4, pp. 348–356, Aug. 2015.
Best Answer
Interaction terms in Poisson regression models are interpreted as a ratio of ratios of rates. With an interaction term, your model's interpretation of that parameter would be, "a rate ratio comparing condition Y to X among individuals of type 2 relative to rate ratio comparing condition Y to X among individuals of type 1".
OLS and Poisson regression in this case will give the same fitted means in models adjusting for interaction since you have a fully specified log-linear model for the table of fitted means. Having a different working model for the distribution of the data will lead to different inference of course (so OLS and Poisson GLMs will have different P-values across the board). However, the fitted means for the models without interaction will be different between OLS and Poisson. This is because the model is not fully specified and the difference between mean rates is taken to be constant in the model.
\begin{equation} \begin{array}{c} \log(\lambda_{ij}) = \beta_0 + \beta_1 i + \beta_2 j \qquad (\mbox{Poisson})\\ \mu_{ij} = \gamma_0 + \gamma_1 i + \gamma_2 j \qquad (\mbox{OLS}) \end{array} \end{equation}
Taking $i = 0, 1$ to denote individual membership and $j=0 ,1$ to denote condition,
Poisson \begin{equation} \begin{array}{ccc} & j=0 & j=1 \\ i=0 & \lambda_{11} = \exp(\beta_0) & \lambda_{12} = \exp(\beta_0 + \beta_2) \\ i=1 & \lambda_{21} = \exp(\beta_0 + \beta_1) & \lambda_{22} =\exp(\beta_0 + \beta_1 + \beta_2) \\ \end{array} \end{equation}
OLS \begin{equation} \begin{array}{ccc} & j=0 & j=1 \\ i=0 & \mu_{11} = \gamma_0 & \mu_{12} = \gamma_0 + \gamma_2 \\ i=1 & \mu_{21} = \gamma_0 + \gamma_1 & \mu_{22} =\gamma_0 + \gamma_1 + \gamma_2 \\ \end{array} \end{equation}