Solved – Robust fitting of discrete choice model in R

discrete dataglmnetmultinomial-distributionrregression

I'd be really grateful for recommendations of a robust package for fitting discrete choice models to a large amount ($n$ in the millions and $p$ in 2000 range) of data. I want a smoothed model that can deal with multi-colinear dependent variables and matrix inversion issues sensibly – like glmnet. I'm happy to bootstrap samples, which may be the only way to deal with big data in R.

I've tried using the mlogit package and it falls apart with more than a few hundred predictors, producing errors to do with matrix inversion.

My alternative is to use the glmnet package for binary regression and then use transforms to approximate the discrete choice model using something called Begg and Gray's approximation.

This data is not multinomial, in the traditional sense. It is discrete-choice, that is the classes themselves change from observation to observation — possibly also the number of classes. Each of the classes has a set of predictors which are measured on the same scale and are class specific — cf. Discrete Choice Models. I wrote to the maintainer of glmnet, Trevor Hastie, who says there is no mapping to discrete choice models in their package.

Another name for discrete-choice is conditional logit, with the correct parameterization. I found the package pglm, but it is also lacking in robustness. There's reference, Discrete choice panel models in R to lme4 also, but I have found no examples of the conditional logit with it.

Best Answer

I know that you have asked for an R package, and I totally understand your desire (as I use R for almost everything). But you should really check out PythonBiogeme for this level of problem.

Related Solutions

Solved – glmnet: How to make sense of multinomial parameterization

About the parameters from multinom and glmnet, I found this answer beneficial, Can I use glm algorithms to do a multinomial logistic regression?

especially, "Yes, with a Poisson GLM (log linear model) you can fit multinomial models. Hence multinomial logistic or log linear Poisson models are equivalent."

So I'll show the reparametrization of the glmnet coefficients to multinom coefficients.

n.subj=1000
x1 <- rnorm(n.subj)
x2 <- rnorm(n.subj)
prob <- matrix(c(rep(1,n.subj), exp(3+2*x1+x2), exp(-1+x1-3*x2)), , ncol=3)
prob <- sweep(prob, 1, apply(prob, 1, sum), "/")

y = c()
for (i in 1:n.subj)
  y[i] <- sample(3, 1, replace = T, prob = prob[i,])

multinom(y~x1+x2)

x <- cbind(x1,x2); y2 <- factor(y)
fit <- glmnet(x, y2, family="multinomial", lambda=0, type.multinomial =     "grouped")
cf <- coef(fit)

cf[[2]]@x - cf[[1]]@x   # for the category 2
cf[[3]]@x - cf[[1]]@x   # for the category 3

Hope this helps. But I don't think I understand the equivalence of Generalized Linear Model(Poisson) and multinomial logistic model in and out.

Tell me if there's good and readable and "easily" understandable source..

Solved – How to estimate mixed logit (or random parameter) discrete choice models in R

there are several packages that you can use in R for mixed logit estimation. Some use frequentists approaches based on simulated ML, others Bayesian approaches based on Markov Chains.

**** R packages based on S-ML

1) mlogit

2) gmnl

**** R packages based on Bayesian estimation

1) RSGHB

https://github.com/RSGInc/RSGHB_Examples

https://cran.r-project.org/web/packages/RSGHB/vignettes/RSGHB_HowTo.pdf

https://cran.r-project.org/web/packages/RSGHB/RSGHB.pdf

http://help.statwizards.com/data-wizard/statistics_programs/r_-_rsghb_package.htm

2) RStan based: https://rawgit.com/rtrangucci/class_20170809/master/multinomial-logit/multinomial-logit-regression.html

there was also a code based on mlogit by Daniel Guhl posted somewhere, which I found easy to implement at the time, but cannot find at the minute.

************ Stata the major players here are

1) Hong Il Yoo (Durham Business School for packages lclogit and lclogitml) and

2) especially Arne Hole (Sheffield for packages mixedlogit and mixedlogitwtp to get models in WTP-space as per Scarpa, Thiene and Train AJAE 2008, Train & Weeks 2005). Look them up on the web.

Good luck

Related Question