I have panel data of varying length for some 400+ companies. Each company is identified by a code.
date code var1 var2 var3 category
2016-01-01 AAA 1 2 3 2
2016-02-01 AAA 2 3 3 3
2016-01-01 BBB 1 2 3 1
2016-02-01 BBB 2 3 3 3
where the category is 1, 2 or 3
I want to do a regression to see which variables affect the Category.
So far I digged it down that I need to use pglm
function from the pglm
package which is the Panel Estimators for Generalized Linear Models.
Information on pglm
is quite limited and is basically just examples.
Though I wanted to run the model with fixed effects (at the date and company level), it does not seem to allow this (Why pglm fails for within model?)
So I ran it with "pooling"
formula_lm1 <- category ~ var1 +var2 +var3
f_pglm <- pglm(
formula_lm1,
data = test,
family = ordinal('logit'),
model = "pooling",
index = c('code', 'date'),
print.level = 0,
method = 'nr'
)
summary(f_pglm)
Questions:
-
Am I using the correct approach – correct function for my task?
-
Does anyone know if using the
index = c('code', 'date')
allows to have fixed effects? -
With my regression, how do I adjust standard errors?
-
Are there any alternative packages/functions to use for my task?
Best Answer
Yes, I think you could use R packages which are traditionally used for choice modelling (e.g., purchase decisions in supermarkets). In your case you could estimate a mixed logit / random parameters logit model to account for the panel nature of the data (i.e., multiple observations per respondent / company7mdash;but you don't really account for the longitudinal aspect of your data). A good package for this type of task would be "mlogit" (https://cran.r-project.org/web/packages/mlogit/vignettes/mlogit.html). Plenty of examples are available.