Solved – Mixing instruments in ivreg2 estimation in Stata

2slsinstrumental-variablesregressionstata

When using a 2sls estimation with ivreg2 with more than one endogenous variable, Stata necessarily — as it seems to be — instruments both endogenous variables with the same set of instruments; that is, if for one of the endogenous variables, $y_1$, I have the set of instruments $z_1$ and $z_2$, and for the second endogenous variable, $y_2$, I have the set $z_3$ and $z_4$, Stata will run first stage regressions for the first endogenous variable ($y_1$) with all the instruments ($z_1$, $z_2$, $z_3$ and $z_4$) and the same happens for the second endogenous variable ($y_2$). Is there a built-in-code to deal with this kind of problem?

If that does not exist in Stata, I know I could use the solution created by Kit Baum and provided by user Andy here. I've adapted the code for two endogenous variables and calculated the correct Standard Errors by just adding vecdiag(cholesky(diag(vecdiag(vcorr)))) and adapting it into a variable. But I wasn't able (for now) to create the additional statistics that ivreg2 produces additionaly, that is, the Kleibergen-Paap rk LM statistic, the Cragg-Donald Wald F statistic or the Kleibergen-Paap rk Wald F statistic (along with the Stock-Yogo critical values) and the Hansen J Statistic. I would be glad if someone could point me some additional coding (or functions) that could at least partially deal with this lack statistical test in my output results.

Best Answer

What Stata does is perfectly fine and it is not a problem. Consistent estimation of IV/2SLS regressions requires that all the instruments appear in all first stages. In the two first stages you should see for $y_1$ that $z_1$ and $z_2$ have the largest effect whilst for $y_2$ it should be $z_3$ and $z_4$. The test statistics produced by ivreg2 are valid for your estimation problem and some of the are especially designed for this type of regression like the Angrist and Pischke F-statistic, or the Stock-Yogo critical values for the Cragg-Donald statistic. This is also described in the ivreg2 help file.

If you still have your heart set on estimating the first stages separately you need to use a multiple-equation model and use 3SLS instead (see this post in the Statalist regarding this topic).

Related Question