Solved – How to combine heckman selection and binary endogenous variable in a two-step way

endogeneitymaximum likelihoodmultiple regressionstatatwo-step-estimation

I want to fit a probit model with a binary endogenous variable and heckman sample selection problem, it's something like

(Y1 = X Y2)(Y2= Z  + X),Y1 is observable if Z1+v>0

This question is not totally unique and there is similar post on the stackexchange. However, the exiting answer——employing the user-written command -cmp- arouse difficulties in practise. The MLE approach is quite fragile when the model setting is moderately complicated,the convergence is always hard to achieve using the -cmp- model (it give the error message like cannot compute an improvement — discontinuous region encountered
convergence not achieved
). Though I don't think it's quite a Stata-specified issue.
What I want to know is , if the following two-step procedure is appropriate, first estimate the (Y2=Z + X) equation and get the predict of Y2 and then put it into a –heckprob– model then get the right estimation. Is such approach fall into the trap of "forbidden regression"?

Any other suggestion and comment are equally welcome, for example, how to improve the possibility of convergence when using –cmp– (I follow Roodman's suggestion to add tech(dfp nr) option at the end of the command).

Another question is, in SUR model like cmp or biprobit, do we need to specify the endogenous variable and their interaction terms at the LHS when fitting the first stage of
2SLS, or just the endogenous variable itself

Best Answer

I think the use of a nonlinear estimation process (probit) in order to produce the first stage of your two stage estimate is not appropriate. For example 2sls is an instance of a broader class control function approachs. Even with typical two-stage linear approaches, your SE will be off unless properly adjusted.