Solved – 2SLS probit vs LPM

2slseconometricsinstrumental-variablesprobitregression

I am using 2SLS to estimate the effect of education on the probability that one works. In the first stage I regress education on my instrument and the other exogenous control variables. The same exogenous control variables are then included in the second stage.

The LPM version is obtained in Stata by the following command:

ivregress 2sls emp i.country i.cohort (education=instrument) 

However, I cannot decide whether to use probit instead. From the literature I have mostly found support for probit. I hence wonder when LPM is consistent and/or preferred to probit?

Best Answer

To cite from Angrist and Pischke's (2009) Mostly Harmless Econometrics,

"...while a nonlinear model may fit the CEF (conditional expectation function) for LDVs (limited dependent variable models) more closely than a linear model, when it comes to marginal effects, this probably matters little. This optimistic conclusion is not a theorem, but as in the empirical example here, it seems to be fairly robustly true." (p. 107)

So if you are interested in the average causal effect (which from the set-up of your question it seems so) then using either LPM and IV probit should be fine. Both have their advantages and disadvantages though.

For instance, if you are interested in prediction then LPM will be no good as predicted probabilities are not restricted to lie between zero and one. If you have clusters in your standard errors (in your case people in the same regions are likely to be subjected to similar shocks to their employment status), the standard errors are more easily adjusted in LPM. IV probit on the other hand is much more expensive in terms of computation and you also need to calculate the marginal effects in order to get interpretable coefficients - in Stata you can do this with the margins command.

For further discussion of LPM and IV probit have a look at these notes from page 34 onwards. The argument that LPM is fine in this case is also made in Wooldridge (2010) Econometric Analysis of Cross Section and Panel Data.

Even though this is the current general opinion on LPM v.s. IV probit/logit there is some recent work that seeks to show that LPM is not that good after all. The main reference for this should be Lewbel et al (2012). However, their example against LPM is rather constructed as it applies only to fairly extreme data cases. Might still be worth to have a look at it because they also compare different methods.