Solved – 2SLS with endogenous interaction term

2slsinstrumental-variablesinteractionnetworksregression

I am trying to estimate a peer effects model where a certain characteristic of the individual and the peers might be endogenous:

$$\ y_i=\alpha + \beta_1 Controls_i + \gamma_1 Endog_i +\gamma_2 \sum_{j\neq i} Endog_j*Link_{ij} + \epsilon_i$$

I found these two links concerning this topic: http://www.stata.com/statalist/archive/2012-05/msg01165.html

Basic 2SLS IV Questions in Stata

that suggest to use a regular ivreg2 command with an interaction between the instrument and the exogenous variable (in my case $Link_{ij}$).
Originally, I was planning on using a different approach and simply run OLS on the endogenous variable using two IVs and use the predicted values in the second stage by hand. Like this I would have used the prediction for the interaction instead of computing the interaction with the IVs. I corrected the standard errors by bootstrapping the second stage regression.

The rationale behind this was that in a second step, I would like to consider endogeneity of the network variable $Link_{ij}$ as well to account for endogenous network creation. The only way to do this to me seems to be to predict both variables of the interaction term (including all IVs used) first, and then compute the interaction term to be used in the second stage. Correcting the standard errors by using a bootstrap.

Is this a valid way to estimate this equation? And could I use my original approach as well?

Thank you!

Best Answer

High level, if you have a valid instrument for endog, then the interaction term between your instrument for endog and link will be a valid instrument for the interaction. So no problems there.

I lack the intuition for dealing with the second part of the problem. So why not simulate? In R:

# Data generating process
n <- 100000
# Create instruments
z1 <- rnorm(n); z2 <- rnorm(n)

# Create unobserved variables
naughty1 <- rnorm(n); naughty2 <- rnorm(n)

# Create endogenous regressors
x1 <- z1 + naughty1 + rnorm(n)
x2 <- z2 + naughty2 + rnorm(n)

# Create outcome variable 
y <- 1 + 0.5*x1 + 0.2*x2 + 0.7*naughty1 -0.5*naughty2 + 2*x1*x2  + rnorm(n)

# Verify that we get biased estimates
mod <- lm(y ~ x1 + x2 + x1:x2)
summary(mod)
# We get fairly strong bias for the coefficients on x1 and x2

# Now we use your method
x1hat <- predict(lm(x1 ~ z1))
x2hat <- predict(lm(x2 ~ z2))

mod2 <- lm(y ~ x1hat + x2hat + x1hat:x2hat)
summary(mod2)
# Hey presto - it works; no more bias. 

Naturally you'd need to boostrap the confidence intervals to correct for the two-stages. But what you've proposed should work. Note that it's not particularly efficient. Even in this case, I had to dial the number of observations up pretty high before it started zeroing in on the true parameters.