Solved – High Collinearity between Instrument and Endogenous Variable in IV Estimation: Weak Instrument Problem

2slseconometricsendogeneityinstrumental-variablestobit-regression

I am estimating an IV Tobit model with one endogenous variable X and one instrument Z.

$$Y=X\beta+ covariates +\epsilon$$ where $cov(X,\epsilon) \ne 0$ due to endogeneity of $X$. I am using IV estimation using the 2SLS framework with instrument $Z$.

$Z$ and $X$, however, are highly collinear $corr(Z,X)=0.8$ which obviously leads the test for instrument relevance on the first stage

$$X=\delta Z+ covariates +\epsilon$$ $$H_0: \delta = 0$$ to yield a very high t-statistic (>30).

Although I have found literature which states that high collinearity between instruments yield weak instrument problems, if the endogenous variable is highly collinear with the instrument, does this cause "weak instrument" problems such as coefficient bias and loss of precision (standard error wise)?

Because $\hat \beta_{IV}$ is simply a ratio of the reduced form to the first stage coefficients, as the reduced form coefficient tends to 1 (under perfect collinearity) $\hat \beta_{IV}$ should converge to the coefficient estimate of the reduced form. Is this problematic for IV estimation?

Best Answer

Not sure where you read that high collinearity between the instrument and the endogenous variable leads to a weak instrument problem - it's actually the other way around.

If you regress $$y = \beta x + \epsilon$$ and your $x$ is correlated with the error term, then the bias of OLS is $$E[\widehat{\beta}_{ols} - \beta] = \frac{Cov(x,\epsilon)}{Var(x)}$$

Suppose you then have a first stage, $$x = \pi z + \eta$$

If $\epsilon$ and $\eta$ are correlated, the bias of OLS can also be written as $\frac{\sigma_{\epsilon \eta}}{\sigma^2_{x}}$. You can then show that the bias of 2SLS is $$E[\widehat{\beta}_{2sls} - \beta] \approx \frac{1}{1 + F}\frac{\sigma_{\epsilon \eta}}{\sigma^2_{\eta}} $$

where $F$ is the first stage $F$ statistic (for a single endogenous variable and a single instrument the $F$ statistic is just the square of the $t$ statistic for your instrument). As you already said, if your instrument and the endogenous variable are not very much correlated this means that $F \rightarrow 0$ and the bias of 2SLS approaches $\frac{\sigma_{\epsilon \eta}}{\sigma^2_{\eta}}$. Note that if $\pi = 0$ in the first stage, then $\sigma^2_{x} = \sigma^2_{\eta}$ and therefore $\frac{\sigma_{\epsilon \eta}}{\sigma^2_{\eta}} = \frac{\sigma_{\epsilon \eta}}{\sigma^2_{x}}$ such that the bias of 2SLS becomes the bias of OLS.

You see that the bias of 2SLS becomes small only if the endogenous variable and the instrument are highly correlated. Only in that case you can have that $F \rightarrow \infty$ and the bias of 2SLS goes towards zero. Of course all of this was from the viewpoint of the standard linear 2SLS model but the general idea will carry over to the Tobit IV model (see for example Finaly and Magnusson (2009) "Implementing Weak Instrument Robust Tests for a General Class of Instrumental Variables Models", The Stata Journal Vol. 9(3), pp. 398-421, [link]).

For further reference see

  • Wooldridge (2010) "Econometric Analysis of Cross Section and Panel Data"
  • Angrist and Pischke (2009) "Mostly Harmless Econometrics"

As a side note: in your question you said that the IV/2SLS coefficient of your endogenous variable is the instrument's reduced form coefficient divided by its first stage coefficient. That's true for linear 2SLS, not for Tobit IV.

Related Question