Omitted Variable Bias – Comparison Between Omitted Variable Bias and Multicollinearity in Regression

biaslinear modelmulticollinearityomitted-variable-biasregression

There's seems to be a bit like catch 22: suppose I am doing linear regression, and I have 2 variables that are highly correlated. If I use both in my model, I will suffer from multicollinearity, but if I don't put both I will suffer from omitted variable bias?

Best Answer

Usually, you would not care about both of them simultaneously. Depending on the goal of your analysis (say, description vs. prediction vs. causal inference), you would care about at most one of them.

Description$\color{red}{^*}$
Multicollinearity (MC) is just a fact to be mentioned, just one of the characteristics of the data to report.
The notion of omitted variable bias (OVB) does not apply to descriptive modelling. (See the definition of OVB in the Wikipedia quote provided below.) In contrast to causal modelling, the causal notion of relevance of variables does not apply for description. You can freely choose the variables you are interested in describing probabilistically (e.g. in the form of a regression) and you evaluate your model w.r.t. the chosen set of variables, not variables not chosen.

Prediction
MC and OVB are largely irrelevant as you are not interested in model coefficients per se, only in predictions.

Causal modelling / causal inference
You may care about both MC and OVB at once when attempting to do causal inference. I will argue that you should actually worry about the OVB but not MC. OVB results from a faulty model, not from the characteristics of the underlying phenomenon. You can remedy it by changing the model. Meanwhile, imperfect MC can very well arise in a well specified model as a characteristic of the underlying phenomenon. Given the well specified model and the data that you have, there is no sound escape from MC. In that sense you should just acknowledge it and the resulting uncertainty in your parameter estimates and inference.

$\color{red}{^*}$I am not 100% sure about the definition of description / descriptive modelling. In this answer, I take description to constitute probabilistic modelling of data, e.g. joint, conditional and marginal distributions and their specific features. In contrast to causal modelling, description focuses on probabilistic but not causal relationships between variables.


Edit to respond to feedback by @LSC:

In defence of my statement that OVB is largely irrelevant for prediction, let us first see what OVB is. According to Wikipedia,

In statistics, omitted-variable bias (OVB) occurs when a statistical model leaves out one or more relevant variables. The bias results in the model attributing the effect of the missing variables to the estimated effects of the included variables. More specifically, OVB is the bias that appears in the estimates of parameters in a regression analysis, when the assumed specification is incorrect in that it omits an independent variable that is a determinant of the dependent variable and correlated with one or more of the included independent variables.

In prediction, we do not care about the estimated effects but rather accurate predictions. Hence, my statement above should become obvious.

Regarding the statement OVB will necessarily introduce bias into the estimation process and can screw with predictions by @LSC.

  • This is tangential to my points because I did not discuss the effect of omitting a variable on prediction. I only discussed the relevance of omitted variable bias for prediction. The two are not the same.
  • I agree that omitting a variable does affect prediction under imperfect MC. While this would not be called OVB (see the Wikipedia quote above for what OVB typically means), this is a real issue. The question is, how important is that under MC? I will argue, not so much.
  • Under MC, the information set of all the regressors vs. the reduced set without one regressor are close. As a consequence, the loss of predictive accuracy from omitting a regressor is small, and the loss shrinks with the degree of MC. This should come as no surprise. We are routinely omitting regressors in predictive models so as to exploit the bias-variance trade-off.
  • Also, the linear prediction is unbiased w.r.t. the reduced information set, and as I mentioned above, that information set is close to the full information set under MC. The coefficient estimators are also predictively consistent; see "T-consistency vs P-consistency" for a related point.