Solved – Basic 2SLS IV Questions in Stata

2slseconometricsinstrumental-variablesstata

(1) If I believe my instrument is exogenous conditional upon a few exogenous variables, do I include them only in the first stage? I.e. would the command be:

ivregress 2sls Y (X= inst1 inst 2 exog1 exog2) exog3 exog4

Where Y is the dependent variable, X is an endogenous variable, inst1 and 2 are the instruments for X, exog1 and exog2 are what make the instrument exogenous to the error in the second stage, and exog3 and exog4 are just second stage variables that enhance precision? Would I need to include exog1 and exog2 outside the parenthesis also?

(2) If I want to interact endogenous variable X with exog3, how should I instrument X and interact it? As follows? ivregress 2sls Y (X*exog3= inst1 inst 2 exog1 exog2) exog4

Best Answer

You need to include all your exogenous variables in both the first and the second stage as otherwise you might end up with biased estimates. For a discussion of why having some exogenous variables in the first but not in the second stage is problematic see here. Given your setup the correct syntax for Stata would be
ivregress 2sls Y exog1 exog2 exog3 exog4 (X = inst1 inst2)

As a side note: instead of ivregress you might want to use ivreg2 which is a user written command that provides many more diagnostic statistics for your 2SLS model.

For the interaction of the endogenous variable and exog3 you would also need to generate an interaction between the instruments and exog3. In a model like $$Y_i = \alpha + \beta_1 \text{exog1}_i + \beta_2 \text{exog2}_i + \beta_3 \text{exog3}_i + \beta_4 \text{exog4}_i + \gamma X_i + \epsilon_i$$ you said that you can instrument $X$ by running the first stage $$X_i = a + \rho_1 \text{exog1}_i + \rho_2 \text{exog2}_i + \rho_3 \text{exog3}_i + \rho_4 \text{exog4}_i + \phi_1 \text{inst1}_i + \phi_2 \text{inst2}_i + e_i $$ and then use the fitted values of this in the second stage. In the same spirit, if inst1 and inst2 are valid instruments for X, then inst1*exog3 and inst2*exog3 will be valid instruments for X*exog3, i.e. for a model $$Y_i = \alpha + \beta_1 \text{exog1}_i + \beta_2 \text{exog2}_i + \beta_3 \text{exog3}_i + \beta_4 \text{exog4}_i + \gamma \text{(X$_i$ $\cdot$ exog3$_i$)} + \eta_i$$ the first stage would be $ \begin{align} \text{(X$_i$ $\cdot$ exog3$_i$)} &= c + \delta_1 \text{exog1}_i + \delta_2 \text{exog2}_i + \delta_3 \text{exog3}_i + \delta_4 \text{exog4}_i + \psi_1 \text{(inst1 $\cdot$ exog3)}_i \newline &+ \psi_2 \text{(inst2 $\cdot$ exog3)}_i + u_i \end{align} $

In Stata the least complicated way would be to generate the interactions by hand

gen Xexog3 = X*exog3
gen inst1exog3 = inst1*exog3
gen inst2exog3 = inst2*exog3
ivregress 2sls Y exog1 exog2 exog3 exog4 (X Xexog3 = inst1 inst2 inst1exog3 inst2exog3)

This type of question has been asked before on the Statalist, so if you are interested in further discussion of the problem have a look here.