2SLS – How to Use 2SLS with Second Stage Probit?

2slsinstrumental-variablesprobitstata

I am trying to use instrumental variables analysis to infer causality with observational data.

I have come across a two-stage least squares (2SLS) regression which is likely to address the endogeneity issue in my research. However, I would like to first stage to be OLS and second stage to be probit within the 2SLS. Based on my reading and search, I have seen researchers use either 2SLS or first stage probit and second stage OLS, but not the other way round which is what I am trying to achieve.

I am currently using Stata and ivreg command in Stata is for a straight 2SLS.

Best Answer

Your case is less problematic than the other way round. The expectations and linear projections operators go through a linear first stage (e.g. OLS) but not not through non-linear ones like probit or logit. Therefore it's not a problem if you first regress your continous endogenous variable $X$ on your instrument(s) $Z$, $$X_i = a + Z'_i\pi + \eta_i$$ and then use the fitted values in a probit second stage to estimate $$\text{Pr}(Y_i=1|\widehat{X}_i) = \text{Pr}(\beta\widehat{X}_i + \epsilon_i > 0)$$

The standard errors won't be right because $\widehat{X}_i$ is not a random variable but an estimated quantity. You can correct this by bootstrapping both first and second stage together. In Stata this would be something like

// use a toy data set as example
webuse nlswork

// set up the program including 1st and 2nd stage
program my2sls
    reg grade age race tenure
    predict grade_hat, xb

    probit union grade_hat age race
    drop grade_hat
end

// obtain bootstrapped standard errors
bootstrap, reps(100): my2sls

In this example we want to estimate the effect of years of education on the probability of being in a labor union. Given that years of education are likely to be endogenous, we instrument it with years of tenure in the first stage. Of course, this doesn't make any sense from the point of interpretation but it illustrates the code.

Just make sure that you use the same exogenous control variables in both first and second stage. In the above example those are age, race whereas the (non-sensical) instrument tenure is only there in the first stage.