Longitudinal Count Data – Choosing the Right Model for Longitudinal Count Data

generalized linear modelpoisson distributionr

I have data for four years on the number of foreign subsidiaries of 120 firms, along with some variables that explain this phenomenon. I was able to obtain decent results using only one year of my data with a GLM regression, Poisson family. My dependent variable is a count variable going from 0 to a maximum of 25, therefore a Poisson regression seems OK (in fact quasipoisson was better because of slight overdispersion). I would like to use my full dataset to obtain even better results. I was wondering if someone could give me some information as to which model would be best for my data, and also which R programs I could potentially use.

My model is quite simple and looks like this:

subsidiaries <- glm(Y ~ X1 + X2 + X3 + SIC, family=poisson, data=mydata)

X1, X2, X3 all vary every year and represent data such foreign sales, assets, employees etc. SIC is a dummy variable that I use to control for firm industry.

I would simply like to use a longitudinal model instead of this simple, cross-sectional one. I would only change my current model to include time effects. Thanks!

Best Answer

What you are looking for might be a Generalized Linear Mixed Model, i.e. a Poisson model with a random intercept to start with.

To motivate the Generalized Linear Mixed Model choice, I will provide some background from Mixed Models Theory and Applications by E. Demidenko (below, refer a "cluster" to a particular firm from your data):

Often data have a clustered (panel or tabular) structure. Classical statistics assumes that observations are independent and identically distributed (iid). Applied to clustered data, this assumption may lead to false results. In contrast, the mixed effects model treats clustered data adequately and assumes two sources of variation, within cluster and between clusters. Two types of coefficients are distinguished in the mixed model: population-averaged and cluster (or subject) - specific. The former have the same meaning as in classical statistics, but the latter are random and are estimated as posteriori means.

and:

The Generalized Linear Mixed Model (GLMM) is an extension of the Generalized Linear Model (GLM) complicated by random effects.

I believe that you are interested in allowing for random effects for years by firm.

This can be obtained with glmer {lme4} function in R:

set.seed(1)
# Longitudinal data in a "long" format 
data.sim <- data.frame(firmID = rep(1:120, times = 4),
                       year = c(rep(1, 120), rep(2, 120), rep(3, 120), rep(4, 120)),
                       X1 = rnorm(120*4),
                       X2 = runif(120*4),
                       res = rpois(120*4, lambda = 10))
head(data.sim)
#   firmID year         X1        X2 res
# 1      1    1 -0.6264538 0.3604340  11
# 2      2    1  0.1836433 0.4421617   7
# 3      3    1 -0.8356286 0.1257292   7
# 4      4    1  1.5952808 0.6243645   6
# 5      5    1  0.3295078 0.3024313   7
# 6      6    1 -0.8204684 0.2396372  12

# Fit a model 
model.glmer <- glmer(res ~ 1 + year + X1 + X2 + (year | firmID),
                     data = data.sim, 
                     family = poisson(link = "log"))
summary(model.glmer)
Related Question