Solved – OLS estimation with omitted variable and multicollinearity

biasleast squaresmulticollinearityregression

I am wondering if:

  1. omitting an important variable

and

  1. having very high correlation between two independent variables

causes the OLS estimators to become biased. If so, can you please provide a short explanation?

Best Answer

(1) Yes, leaving out a regressor can introduce bias into your regression, under certain conditions.

That is, if $Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + U$ is the true data generating process, and you estimated the regression using just $X_1$, then it can be shown that

$$ \hat \beta_1^\ast \overset{p}{\to} \beta_1 + \beta_2\frac{Cov(X_1,X_2)}{Var(X_1)}.$$

Thus, $\hat \beta_1^\ast$ will be biased only if $\beta_2\neq 0$ and $X_1$ and $X_2$ are correlated.

(2) Collinearity will not bias your regression. However, if can make your regression hard/impossible to meaningfully estimate.

In the extreme case, if you have perfect correlation between two variables, then you cannot even estimate the regression, because you will not be able to take the inverse of $(X'X)$ when calculating $\hat \beta = (X'X)^{-1}X'Y$.

However, if you have a very high level of correlation between two variables $X_1$ and $X_2$, then the variance of your two estimates $\beta_1,\beta_2$ will be high. If you think about it intuitively, the regression model doesn't really know whether to assign the effect of increasing the variables (which move together) to $\beta_1$ or $\beta_2$. Thus, it becomes hard to show statistical significance of coefficients, and estimates can be very unstable.

Furthermore, the values in the estimated regression may not be interpretable. Typically, we interpret coefficients as saying "all else equal, a one unit increase in $X_1$ will results in ....". However, a one unit increase in $X_1$ all else equal will never happen, because $X_1$ and $X_2$ are so correlated.