Solved – Using predicted probabilities as regressors

instrumental-variablesinterpretationmarginal-effectprobit

I am working on a project where I investigate growth in wages due to migration. I correct for the endogeneity in the decision to migrate (only those that are most likely to gain from migration will migrate) by first using a probit model to predict the probabilities of migration based on various characteristics. I then use the predicted probabilities in a second step as a proxy for migration (this in effect is an instrumental variables regression).

My problem is that I get unreasonably high estimates – wages are predicted to increase up to 200%. My concern is that since my predicted probabilities are very low (on average 3%, 25% at the 99th percentile), which is reasonable as in the sample only about 5% migrate, the results that I get come from the marginal increase of probability to migrate from 0 to 1. As far as the predicted probabilities go in my sample, an increase from 0 to 1 is very extreme. Could this be causing the huge estimates? Am I interpreting this correctly? Or should I rather look at the strength of my instruments, etc.?

Best Answer

If you are interested in an approximation of the average partial effect you could just use a linear probability model in the first stage, i.e. do your instrumental variables estimation via 2SLS, for instance, in the usual way. However, due to the non-linearities involved this is not the efficient approach but it can give a good initial idea of the effect under study. For a more in-depth treatment of this argument see Wooldridge (2010) "Econometric Analysis of Cross-Section and Panel Data" in section 15.7.3 from page 594 onward. On page 265-268 he explains the forbidden regression and its problems.

Another procedure that you might be interested in was used by Adams et al. (2009). They use a three-step procedure where they have a probit "first stage" and an OLS second stage without falling for the forbidden regression problem. Their general approach is:

  1. use probit to regress the endogenous variable on the instrument(s) and exogenous variables
  2. use the predicted values from the previous step in an OLS first stage together with the exogenous (but without the instrumental) variables
  3. do the second stage as usual

This procedure will yield unbiased estimates and generally is more efficient than doing 2SLS with a linear probability model in the first stage.