Solved – Test for the exogeneity of instrument by regressing it against the residual

econometricsinstrumental-variables

I'm concerned that my OLS regression is biased / inconsistent due to omitted variable bias:

$$Y = \beta_0 + \beta_1 X + \epsilon$$

I'm thinking about using $Z$ as instrument for $X$. Is there a test for the validity of $Z$ that is based on regressing $Z$ on the residual $e$ (i.e. estimate of $\epsilon$) from the model above?

This idea sounds too good to be true, but I couldn't figure out exactly what's wrong with it (especially since the Sargan overidentification test also seems to test the correlation of the instruments on the residuals to check exogeneity)

Note: I understand the idea of exclusion restriction, that instrument must be validated by theory, etc. The question is not about how IV works (which many textbooks have described) — it's about why the idea above doesn't work. Hopefully such "lateral" question will help me understand IV more deeply.

Best Answer

Okay I will be more clear here. We have the following regression model: $$ Y=\alpha+\beta X+\upsilon $$ such that $cov(X,u)\neq0.$ Suppose we have a valid instrument, call it $Z.$ We then consider the following first stage: $$ X=\alpha_{1}+\gamma Z+e $$

What you are suggesting is why we cannot use the instrument to check for correlation with the residuals directly. The reason is the following: consider the method of moments estimator for OLS. The parameters in the population model will satisfy: $$ E[X'u]=0 $$ Replace this with the sample analogue: $$ X'(Y-X\hat{\beta)}=0 $$ You obtain the method of moment OLS estimator as: $$ \hat{\beta}=(X'X)^{-1}(X'Y) $$

Now suppose you have the valid instrument $Z.$ The method of moment estimator for IV uses the moment condition: $$ E[Z'U]=0 $$ Replace it with the sample analogue: $$ Z'(Y-X\hat{\beta)}=0 $$ or $$ \hat{\beta_{IV}}=(Z'X)^{-1}(Z'Y) $$

If you regress the residuals on the instrument, you will always fail to reject the null because of the way the estimator was made. This Sargan test exploits the fact that this is not the case when you have more instruments than endogenous regressors. Put differently, the construction of the estimator is based on the same thing you want to test!

Related Question