Probit Two-Stage Least Squares (2SLS) – Overview and Application

2slsbinary datainstrumental-variablesprobit

I was told that it's possible to run a two-stage IV regression where the first stage is a probit and the second stage is an OLS. Is it possible to use 2SLS if the first stage is a probit but the second stage is a probit/poisson model?

Best Answer

What was proposed to you is sometimes referred to as a forbidden regression and in general you will not consistently estimate the relationship of interest. Forbidden regressions produce consistent estimates only under very restrictive assumptions which rarely hold in practice (see for instance Wooldridge (2010) "Econometric Analysis of Cross Section an Panel Data", p. 265-268).

The problem is that neither the conditional expectations operator nor the linear projection carry through nonlinear functions. For this reason only an OLS regression in the first stage is guaranteed to produce fitted values that are uncorrelated with the residuals. A proof for this can be found in Greene (2008) "Econometric Analysis" or, if you want a more detailed (but also more technical) proof, you can have a look at the notes by Jean-Louis Arcand on p. 47 to 52.

For the same reason as in the forbidden regression this seemingly obvious two-step procedure of mimicking 2SLS with probit will not produce consistent estimates. This is again because expectations and linear projections do not carry over through nonlinear functions. Wooldridge (2010) in section 15.7.3 on page 594 provides a detailed explanation for this. He also explains the proper procedure of estimating probit models with a binary endogenous variable. The correct approach is to use maximum likelihood but doing this by hand is not exactly trivial. Therefore it is preferable if you have access to some statistical software which has a ready-canned package for this. For example, the Stata command would be ivprobit (see the Stata manual for this command which also explains the maximum likelihood approach).

If you require references for the theory behind probit with instrumental variables see for instance:

  • Newey, W. (1987) "Efficient estimation of limited dependent variable models with endogenous explanatory variables", Journal of Econometrics, Vol. 36, pp. 231-250
  • Rivers, D. and Vuong, Q.H. (1988) "Limited information estimators and exogeneity tests for simultaneous probit models", Journal of Econometrics, Vol. 39, pp. 347-366

Finally, combining different estimation methods in the first and second stages is difficult unless there exists a theoretical foundation which justifies their use. This is not to say that it is not feasible though. For instance, Adams et al. (2009) use a three-step procedure where they have a probit "first stage" and an OLS second stage without falling for the forbidden regression problem. Their general approach is:

  1. use probit to regress the endogenous variable on the instrument(s) and control variables
  2. use the predicted values from the previous step in an OLS first stage together with the control (but without the instrumental) variables
  3. do the second stage as usual

A similar procedure was employed by a user on the Statalist who wanted to use a Tobit first-stage and a Poisson second stage (see here). The same fix should be feasible for your estimation problem.