Solved – Proof that omitted variable bias may lead to endogeneity

biaseconometricsendogeneityproofregression

I am looking for a proof that omitted variable bias (OVB) in OLS regression may lead to endogeneity. I have found many examples here and out there on how to prove that a given parameter $b_{j}$ (where $j=1,…,J$ parameters in the model) gets biased, for instance these two threads:

But this is not exactly what I want. What I want is a more generic proof that a given variable $X_j$ gets correlated with the error term $e$ when there is OVB, i.e., that ${\rm Cov}(X_j,e) \ne 0$.

For instance, let's say the correct equation would be:
$$Y = b_0 + b_1 X_1 + b_2 X_2 + b_3 X_3 + u$$
But we estimate the following:
$$Y = b_0 + b_1 X_1 + b_2 X_2 + e,$$
where we are omitting $X_3$ and of course its coefficient $b_3$.

Assuming that ${\rm Cov}(X_3, X_2) \ne 0$, how is it possible to prove that ${\rm Cov}(X_2,e) \ne 0$ and therefore $X_2$ is endogenous due to OVB, instead of just calculating the amount of bias in $b_2$?

Best Answer

To prove this, start from the probability limit of the OLS estimator. Let $X$ denote the full matrix of regressors to be used, $[1,X_1,X_2]$, and let $e \equiv u + b_3 X_3$. Also, let $b$ be the parameters we are trying to estimate, i.e. $b = (b_0,b_1,b_2)$.

\begin{align*} p\lim \hat{\beta} &= p\lim \left[ (X'X)^{-1}X'Y \right] \\ &= p\lim \left[ (X'X)^{-1}X'Y \right] \\ &= p\lim \left[ (X'X)^{-1}X'(Xb + e) \right] \\ &= p\lim \left[ (X'X)^{-1}X'Xb \right] + p\lim \left[ (X'X)^{-1}X'e \right] \\ &= p\lim \left[ (X'X)^{-1}X'X \right] b + p\lim \left[ (X'X)^{-1}X'(b_3 X_3 + u) \right] \\ &= b + b_3 p\lim \left[ (X'X)^{-1}X' X_3 \right] + p\lim \left[ (X'X)^{-1}X'u \right] \\ &= b + b_3 p\lim \left[ (X'X)^{-1}X' X_3 \right] \\ &= b + b_3 \mathbb{E}(X'X)]^{-1} \mathbb{E}(X' X_3) \end{align*}

Above, a key step is of course that $p\lim \left[ (X'X)^{-1}X'u \right] =0$, which happens because

$$ p\lim \left[ (X'X)^{-1}X'u \right] = (p\lim X'X)^{-1} p\lim (X'u) = [\mathbb{E}(X'X)]^{-1} \mathbb{E}(X'u) $$, since $\mathbb{E}(X'u)=0$, which holds because the original assumption is that each of the regressors are uncorrelated with $u$ (but not necessarily $e$).

Now we see that $p\lim \hat{\beta} \ne b$ whenever $\mathbb{E}(X'X_3) \ne 0$, that is whenever there is correlation between $X_1$ and $X_3$ or between $X_2$ and $X_3$.