Solved – Endogenous interaction term in a triangular system using control function (CF) approach

econometricsendogeneitystata

I am trying to estimate a model using control function approach

(I will write all the variables in a scalar form so that it is easier to explain):

The main equation:

$y_1 = \beta_0 + \beta_1 \times w + \beta_2 \times y_2 + \beta_3 \times w \ast y_2 + \beta_4 \times x_1 + \beta_5 \times x_2 + u_1
$

where the interaction term $w\ast y_2$ is mainly of interest. $w$ is exogenous. $x_1$ and $x_2$ are other control variables. However, I have theoretical reasons to believe that $y_2$ is endogenous and affected by $w$, something like:

$y_2 = \gamma_0 + \gamma_1 \times w + \gamma_2 \times z + \gamma_3 \times x_1 + \gamma_4 \times x_2 + u_2$

where $z$ is a scalar or a set of additional exogenous variables.
Here are the steps I followed to estimate this model:

  1. Regress $y_2$ on $w, z, x_1, x_2$ and obtain $\hat{u_2}$
  2. Regress $y_1$ on $w, y_2, w \ast y_2, x_1, x_2, \hat{u_2}$

Is it the correct approach? I am a little confused because in Wooldridge (2010) (p.270) it has an example:

$y_1 = \boldsymbol{z}_1 \boldsymbol{\delta}_1 + \alpha_{11} y_2 + \alpha_{12} y_3 + \alpha_{13} y_2 y_3 + y_2 \boldsymbol{z}_1 \boldsymbol{\gamma}_{11} + y_3 \boldsymbol{z}_1 \boldsymbol{\gamma}_{12}+u_1$

$y_2 = \boldsymbol{z}\boldsymbol{\beta}_2 + u_2$

$y_3 = \boldsymbol{z}\boldsymbol{\beta}_3 + u_3$

$\boldsymbol{z}$ and $\boldsymbol{z}_1$ are exonogenous.

The estimation is done by regress $y_{i1}$ on $\boldsymbol{z}_{i1}, y_{i2}, y_{i3}, y_{i2}y_{i3}, y_{i2}\boldsymbol{z}_{i1}, y_{i3}\boldsymbol{z}_{i1}, \hat{u_{i2}}, \hat{u_{i3}}$.

In the first equation $\boldsymbol{z}_1$ are the only exogenous variables. Does it mean the control variables are in this vector, and I should have the interaction terms between $y_2$ (or $y_3$) and each of variables in $\boldsymbol{z}_1$, including all the controls? It does not feel right but the text seems to suggest so.

There are a number of posts on line talking about endogenous interaction terms and 2SLS seems to be the common method. If the control function approach does not work here, how would I estimate my model using Stata?

Any help is appreciated!

Best Answer

You have two endogenous variables, $y_2$ and $w * y_2$. You should:

  1. Regress $y_2$ on $w$, $z$, $w * z$, and the $x$'s. Obtain the residuals, $\hat{u}_2$.
  2. Regress $w*y_2$ on $w$, $z$, $w * z$, and the $x$'s. Obtain the residuals, $\hat{u}_3$.
  3. Regress $y_1$ on $w$, $y_2$, $w*y_2$, the $x$'s, $\hat{u}_2$ and $\hat{u}_3$.

For two stage least squares, instead of the residuals in step 1 & 2, you obtain the fitted values, and then in step 3, you regress $y_1$ on $w$, $\hat{y}_2$, $\widehat{w*y_2}$, and the $x$'s. Either procedure gives the same estimates.

Related Question