Solved – Robust fitting of discrete choice model in R

discrete dataglmnetmultinomial-distributionrregression

I'd be really grateful for recommendations of a robust package for fitting discrete choice models to a large amount ($n$ in the millions and $p$ in 2000 range) of data. I want a smoothed model that can deal with multi-colinear dependent variables and matrix inversion issues sensibly – like glmnet. I'm happy to bootstrap samples, which may be the only way to deal with big data in R.

I've tried using the mlogit package and it falls apart with more than a few hundred predictors, producing errors to do with matrix inversion.

My alternative is to use the glmnet package for binary regression and then use transforms to approximate the discrete choice model using something called Begg and Gray's approximation.

This data is not multinomial, in the traditional sense. It is discrete-choice, that is the classes themselves change from observation to observation — possibly also the number of classes. Each of the classes has a set of predictors which are measured on the same scale and are class specific — cf. Discrete Choice Models. I wrote to the maintainer of glmnet, Trevor Hastie, who says there is no mapping to discrete choice models in their package.

Another name for discrete-choice is conditional logit, with the correct parameterization. I found the package pglm, but it is also lacking in robustness. There's reference, Discrete choice panel models in R to lme4 also, but I have found no examples of the conditional logit with it.

Best Answer

I know that you have asked for an R package, and I totally understand your desire (as I use R for almost everything). But you should really check out PythonBiogeme for this level of problem.

Related Question