Solved – Panel data and multinomial logistic regression in R

I have panel data of varying length for some 400+ companies. Each company is identified by a code.

date         code    var1   var2  var3    category
2016-01-01   AAA        1     2      3    2
2016-02-01   AAA        2     3      3    3
2016-01-01   BBB        1     2      3    1
2016-02-01   BBB        2     3      3    3

where the category is 1, 2 or 3

I want to do a regression to see which variables affect the Category.

So far I digged it down that I need to use pglm function from the pglm package which is the Panel Estimators for Generalized Linear Models.

Information on pglm is quite limited and is basically just examples.

Though I wanted to run the model with fixed effects (at the date and company level), it does not seem to allow this (Why pglm fails for within model?)

So I ran it with "pooling"

formula_lm1 <- category ~ var1 +var2 +var3
f_pglm <- pglm(
    formula_lm1, 
    data = test, 
    family = ordinal('logit'), 
    model = "pooling", 
    index = c('code', 'date'), 
    print.level = 0, 
    method = 'nr'
)
summary(f_pglm)

Questions:

Am I using the correct approach – correct function for my task?
Does anyone know if using the index = c('code', 'date') allows to have fixed effects?
With my regression, how do I adjust standard errors?
Are there any alternative packages/functions to use for my task?

Best Answer

Yes, I think you could use R packages which are traditionally used for choice modelling (e.g., purchase decisions in supermarkets). In your case you could estimate a mixed logit / random parameters logit model to account for the panel nature of the data (i.e., multiple observations per respondent / company7mdash;but you don't really account for the longitudinal aspect of your data). A good package for this type of task would be "mlogit" (https://cran.r-project.org/web/packages/mlogit/vignettes/mlogit.html). Plenty of examples are available.

Best Answer

Related Solutions

Solved – Finding the best combination of variables for high R-squared values

Solved – Estimating robust standard errors in panel data regressions

Related Question