Solved – Variance of OLS Coefficients with Omitted Variables

inferencemathematical-statisticsregression

I've read in several sources, for example here page 51, that if you omit a relevant variable from an OLS regression the resulting standard errors are smaller. Their arguments make sense but I'm trying to reconcile them with the following, which to me should make the standard errors higher.

Suppose the real model is

$$y = X\beta + Z\delta + \epsilon $$

but instead we estimate

$$\tilde y = X\beta + \mu$$

where $\epsilon, \mu$ are IID white noise.

My argument is that by not including the relevant variable $Z$ your mean squared error will be greater. The MSE is also your estimator of the variance of the noise term. The variance of the noise term is part of the estimate of the standard error of the coefficients, therefore the estimated standard error should be higher (and variance estimates will also be biased high).

The other thing is suppose I actually calculate the standard error, which is

$$Var\big((X'X)^{-1}X'\tilde y\big) = (X'X)^{-1}X'Var(\tilde y)X(X'X)^{-1}$$

$$Var(\tilde y) = Var(X\beta + \mu) = Var(\mu)$$

Since $\mu = Z \delta + \epsilon$ we have that

$$Var(\tilde y) = Var(Z\delta + \epsilon) = \delta^2Var(Z) + Var(\epsilon)$$

I used the fact that $Z$ and $\epsilon$ are independent and that since I don't know $Z$, it's a random variable.

Either way both of these things point me to the fact that if you omit a variable the standard error of the coefficients should be greater, not lesser. How do I make sense of this?

Best Answer

Generally speaking, omitting an explanatory variable from the regression model will increase the error variance. Which suggests an increase in the variance of our OLS regression coefficients.

Under your example, you have moved from the multiple regression case to the simple regression case, so that the variance formula for our OLS regression coefficients has changed. The difference being, that under the simple regression case the variance formula does not include the (1-R^2_j) term in the denominator.

If you assume that there is some form of correlation between your explanatory variables in the multiple regression case, then the denominator from your variance formula will be greater under the simple regression case (assuming the total sum of squares for your explanatory variable remains constant).

So the answer being; it depends on how much your error variance has changed due to omitting your explanatory variable and the size of the correlation between your explanatory variables.

Related Question