Solved – Endogeneity test instrumental variables

endogeneityinstrumental-variablesintuitionlinear model

I'm reading a paper in which is used the following endogeneity test:

  1. First of all, we have the initial linear model: $$y = \beta_0 + \beta_1x_1 +
    \beta_2x_2 + \beta_3x_3 + e$$ $x_3$ is the endogenous regressor and $z$ is
    the instrument.
  2. We regress the endogenous regressor on the instrument and the
    exogenous regressors: $$x_3 = b_0 + b_1x_1 +b_2x_2 + b_3z + e$$
  3. We recover the residual $u$ of the linear regression of the previous
    point. Then we estimate the following linear model: $$y = \beta_0 + \beta_1 x_1 +
    \beta_2 x_2 + \beta_3 x_3 + \rho u + e$$

The paper says that this is an endogeneity test: if the estimated coefficients in step 3 are very similar to those in step 1 then regressor $x_3$ was not endogenous.

Could anyone explain me the intuition behind this test?

Best Answer

What you are looking at is formally known as the control function approach. When you run your first stage $$x_3 = b_0 + b_1x_1 +b_2x_2 + b_3z + u$$ you basically split the variation in $x_3$ into exogenous variation (that comes from the exogenous and instrumental variables), and you leave the "bad" variation that is correlated with $e$ in your first regression.

You know that when you regress $$y = \beta_0 + \beta_1x_1 + \beta_2 x_2 + \beta_3x_3 + e$$ some part of your endogenous variable is correlated with $e$, i.e. it is contained in the error term. This part is captured by $u$ in the first stage. So you can imagine that $e$ is a sort of composite error $e = \epsilon + u$ (formally this isn't the right way of making the point but it is intuitive). Therefore, if you regress $$y = \beta_0 + \beta_1x_1 + \beta_2 x_2 + \beta_3x_3 + \rho u + e$$ there is no endogeneity problem anymore because the part of $x_3$ which is correlated with $e$ is not in this error term anymore because it is included in the regression as $u$.

If you run 2SLS instead, you will notice that the $\beta_3$ will have the exact same value as the one from the control function approach (see this related question and its answer). In essence your authors are restating the Hausman test. You know that the control function approach or 2SLS will give you consistent estimates. Therefore, if such estimates are not significantly different from the OLS estimates the bias in OLS cannot be big (under the assumption that the instrument is valid and strong).

Related Question