Solved – How to validate a Multinomial Logit and Probit Model fit

goodness of fitlogitmultinomial-distributionprobitvalidation

I would like to know how do you determine the performance of your models. That is, if you fit a multinomial logit or probit model for un-ordered discrete choice. What do you use to evaluate whether you have a good model?

Please provide me with any reference material that I could read more on this topic. Websites, articles or book references on this topic will be a great help. I am kind of stuck!

I have R, SAS, Stata, SPSS, and Minitab available.

Best Answer

I would consider these tests at a minimum:

Hausman or Small-Hsiao tests of the IIA
Confusion matrices of predicted vs. actual outcome
Information criteria (AIC, BIC)
Various scalar measures of fit (like McFadden's $R^2$)
Wald or LR tests for combining alternatives

With Stata, check out the SPost from Long and Freese as their as their categorical variables book for code and a nice intro to all of these tests with examples.

Related Solutions

Solved – Estimate multinomial probit model with mlogit (R package)

The run with probit=TRUE has not converged to a good answer. See the line in the output that starts with 'last step could not find higher value' and compare the same section in the logit model output. The other reason it takes so long to fit the probit model is that the software is approximating a high dimensional integral using simulation (See the vignette for mlogit, pg 54). Sometimes rescaling covariates can help with numerical difficulties but that's not the case here, I tried

HS$ic <- scale(H$ic)
HS$oc <- scale(H$oc)

m2.probit = mlogit(depvar~ic+oc, HS, probit=TRUE)

and had the same difficulty. I would treat the results of the probit model with a degree of skepticism. In particular, there is something funny going on with the outcome 'gr' in the probit model (see intercepts, and variance parameter estimates).

The coefficients in the summary that are labeled er.gc, er.gr etc. are the parameters of the variance-covariance matrix that is being estimated as part of the probit model.

Solved – Probit/Logit Model – How to find the parameter $\beta$

It is hard to answer your question, as you do not provide the background you have. Here is a general answer.

You want to find $\mathbb{P}[Y_i = 1 | X_i]$ when you have a bunch of data that comes from the same distribution, that is, the result ($Y_i$) is influenced the same way by the inputs ($X_i$) of each sample you have. Read more: I.I.D. (Wikipedia).

You know $X_i$ and $Y_i$ for your samples, and you want to find a function that maps the inputs to the outputs as close as possible. This is what the probit/logit models are doing, and they have a parameter $\beta$ to dictate the influence of the different input variables. To find the best $\beta$, we have first to define what the best is.

In linear regression models, the best model is often defined by the model that has the best $R^2$ measure with the output, or the one that has the smallest Mean Squared Error (Wikipedia). In classification tasks, a common measure is the Logistic loss (Wikipedia).

The value of the loss function is $L(y, f(x,\beta))$ where $L$ is the loss function, $x,y$ the input and output variables, and $\beta$ are the parameters of your model. It is a comparison of your model with the real result $y$, and gives you a metric to evaluate your model. You want to have the lowest error with respect to $\beta$, $$\hat{\beta} = \arg \min_{\beta} L(y,f(x,\beta))$$ Depending on your model and cost function, you can equate the derivative to 0 and compute the $\beta$ that makes it possible based on the data. In the case of the transformed models you are interested about, this is not possible, as as mentionned in the comments, no closed solution exists. However, you can come very close to the optimal solution by using gradient descent (Wikipedia). The idea is that the derivative of the function at a certain $\beta$ will point to the direction of highest increase in the cost function with respect to $\beta$, and if you go in the other direction, your $\beta$ will improve and the error will lower. As long as your cost function is convex, you will fill a global minimum, the optimal $\beta$.

How to find the parameters of the logistic regression is the introduction of a lot of books and classes in Machine learning. For a more specific answer, I'd advise looking at

Elements of Statistical Learning, by Hastie, Tibshirani and Friedman (Book website, free download)
Pattern Recognition and Machine Learning, by Christopher Bishop.
Machine Learning: A Probabilistic Perspective, by Kevin Murphy.
Andrew Ng's Mooc
Any other book/mooc/class on statistical/machine learning will introduce this concept.

You can also take a look at questions on a similar topic on CrossValidated,

Best Answer

Related Solutions

Solved – Estimate multinomial probit model with mlogit (R package)

Solved – Probit/Logit Model – How to find the parameter $\beta$

Related Question