Econometrics – Standard Errors of Two Stage Least Squares in Stata

econometricsinstrumental-variablesstandard errorstata

I use Stata. I am trying to replicate the ivreg output of a regression performing manually the first stage, predicting the instrument after the first stage and running the second stage regression with the instrument in place of the endogenous regressor in the structural model.
Naturally, the standard errors of my second stage regression do not take into account the fact that I am using an estimated regressor: they are different from those in the output of the ivreg command.
My question is: How could I obtain reliable inference without using the ivreg command? IS there an option I should add to the second stage regression to have reliable standard errors? If not, how could I obtain reliable standard errors starting from the second stage manual regression?

Best Answer

The relevant formula is $$\mathbb{Var}(\beta_{IV})=\sigma^2 \cdot (X'P_{Z}X)^{-1},$$ where $$\sigma^2 = (y-X\beta_{IV})'(y-X\beta_{IV})/(n-k_{SS}),$$

and $$P_Z = Z \, (Z'Z)^{-1} Z',$$ and $k_{SS}$ is the number of regressors in the second stage. Some people will just use $n$ or $n-k_{FS}$ since the choice does not matter asymptotically.

Kit Baum has code in this thread on Old Statalist. I've tweaked it slightly to use ivregress rather than ivreg2:

// how to fix 2SLS estimates done 'by hand'
sysuse auto, clear
ivregress 2sls price headroom (weight = turn foreign)
estat vce
di e(rmse)
mat v2sls = e(V)
  
// First stage reg
qui reg weight turn foreign headroom
predict double what, xb

// Second stage reg
qui reg price what headroom
scalar rmsebyhand = e(rmse)

// the 'wrong' VCE, calculated from the instruments
mat vbyhand = e(V)
scalar dfk = e(df_r)

// the correct resids: orig regressors * second stage coeffs 
gen double eps2 = (price - _b[what]*weight - _b[headroom]*headroom - _b[_cons])^2
qui su eps2

// corrected RMSE, based on the correct resids
scalar rmsecorr = sqrt(r(sum) / dfk)

// corrected VCE, using the right s^2
mat vcorr = (rmsecorr / rmsebyhand)^2 * vbyhand
mat li vcorr

// check to see that it equals the real 2SLS VCE
mat diff = v2sls - vcorr
mat li diff