Solved – How to choose between logit, probit or linear probability model

econometricsgeneralized linear modellogisticprobit

To decide whether to use logit, probit or a linear probability model I compared the marginal effects of the logit/probit models to the coefficients of the variables in the linear probability model. However, since they are not similar, I am not sure how to go about choosing a model that would best fit?

Best Answer

Modeling a dichotomous outcome using linear regression is a big no-no. The error terms will not be normally distributed, there will be heteroskedasticity, and predicted values will fall outside the logical boundaries of 0 and 1.

Logit and probit differ in the assumption of the underlying distribution. Logit assumes the distribution is logistic (i.e. the outcome either happens or it doesn't). Probit assumes the underlying distribution is normal which means, essentially, that the observed outcome either happens or doesn't but this reflects a certain threshold being met for the underlying latent variable which is normally distributed.

In practice the end result of these different distributional assumptions is that coefficients differ, usually by a factor of about 1.6. However, if you look at marginal effects (meaning the effects on the predicted mean of the outcome holding other covariates at the mean or averaging over observed values) the logit and probit models will make essentially the same predictions. So if you're looking at marginal effects the choice probably doesn't matter.

On the other hand, if you're not going to go about calculating the margins then logit has the obvious advantage of generating coefficients that can be transformed into the familiar odds ratio by exponentiating the coefficient. Probit coefficients are essentially uninterpretable - given a probit model I would report average marginal effects for this very reason. Of course most people improperly interpret odds ratios as probabilities which is a big no-no. The odds of an outcome occurring is a ratio of successes to failures (an odds of 1 would correspond to a probability of .5). Odds RATIOS, then, reflect the predicted change in the odds given a 1 unit change in the predictor. Thus, the odds ratio reflects change relative to the base odds of the outcome occurring. Given an outcome that either rarely occurs or almost always occurs, a small change in probability can correspond to a large odds ratio. Odds ratios are a ratio of ratios which can be quite confusing and so we arrive at a reason to report marginal effects in the context of a logit model.

So, to summarize, don't use a linear probability model. Use logit or probit and report the marginal effects. The choice is, perhaps, of theoretical significance but probably of no practical consequence if reporting marginal effects. If you're not going to report marginal effects then use logit but be sure to properly interpret the odds ratios so you don't look like an uninformed idiot.