There's seems to be a bit like catch 22: suppose I am doing linear regression, and I have 2 variables that are highly correlated. If I use both in my model, I will suffer from multicollinearity, but if I don't put both I will suffer from omitted variable bias?
Omitted Variable Bias – Comparison Between Omitted Variable Bias and Multicollinearity in Regression
biaslinear modelmulticollinearityomitted-variable-biasregression
Related Solutions
The main issue here is the nature of the omitted variable bias. Wikipedia states:
Two conditions must hold true for omitted-variable bias to exist in linear regression:
- the omitted variable must be a determinant of the dependent variable (i.e., its true regression coefficient is not zero); and
- the omitted variable must be correlated with one or more of the included independent variables (i.e. cov(z,x) is not equal to zero).
It's important to carefully note the second criterion. Your betas will only be biased under certain circumstances. Specifically, if there are two variables that contribute to the response that are correlated with each other, but you only include one of them, then (in essence) the effects of both will be attributed to the included variable, causing bias in the estimation of that parameter. So perhaps only some of your betas are biased, not necessarily all of them.
Another disturbing possibility is that if your sample is not representative of the population (which it rarely really is), and you omit a relevant variable, even if it's uncorrelated with the other variables, this could cause a vertical shift which biases your estimate of the intercept. For example, imagine a variable, $Z$, increases the level of the response, and that your sample is drawn from the upper half of the $Z$ distribution, but $Z$ is not included in your model. Then, your estimate of the population mean response (and the intercept) will be biased high despite the fact that $Z$ is uncorrelated with the other variables. Additionally, there is the possibility that there is an interaction between $Z$ and variables in your model. This can also cause bias without $Z$ being correlated with your variables (I discuss this idea in my answer here.)
Now, given that in its equilibrium state, everything is ultimately correlated with everything in the world, we might find this all very troubling. Indeed, when doing observational research, it is best to always assume that every variable is endogenous.
There are, however, limits to this (c.f., Cornfield's Inequality). First, conducting true experiments breaks the correlation between a focal variable (the treatment) and any otherwise relevant, but unobserved, explanatory variables. There are some statistical techniques that can be used with observational data to account for such unobserved confounds (prototypically: instrumental variables regression, but also others).
Setting these possibilities aside (they probably do represent a minority of modeling approaches), what is the long-run prospect for science? This depends on the magnitude of the bias, and the volume of exploratory research that gets done. Even if the numbers are somewhat off, they may often be in the neighborhood, and sufficiently close that relationships can be discovered. Then, in the long run, researchers can become clearer on which variables are relevant. Indeed, modelers sometimes explicitly trade off increased bias for decreased variance in the sampling distributions of their parameters (c.f., my answer here). In the short run, it's worth always remembering the famous quote from Box:
All models are wrong, but some are useful.
There is also a potentially deeper philosophical question here: What does it mean that the estimate is being biased? What is supposed to be the 'correct' answer? If you gather some observational data about the association between two variables (call them $X$ & $Y$), what you are getting is ultimately the marginal correlation between those two variables. This is only the 'wrong' number if you think you are doing something else, and getting the direct association instead. Likewise, in a study to develop a predictive model, what you care about is whether, in the future, you will be able to accurately guess the value of an unknown $Y$ from a known $X$. If you can, it doesn't matter if that's (in part) because $X$ is correlated with $Z$ which is contributing to the resulting value of $Y$. You wanted to be able to predict $Y$, and you can.
The case of "attenuation bias" can be more clearly presented if we examine the "probit" model -but the result carry over to the logistic regression also.
Underneath the Conditional Probability Models (Logistic (logit), "probit", and "Linear Probability" models) we can postulate a latent (unobservable) linear regression model:
$$y^* = X\beta + u$$
where $y^*$ is a continuous unobservable variable (and $X$ is the regressor matrix). The error term is assumed to be independent from the regressors, and to follow a distribution that has a density symmetric around zero, and in our case, the standard normal distribution $F_U(u)= \Phi(u)$.
We assume that what we observe, i.e. the binary variable $y$, is an Indicator function of the unobservable $y^*$:
$$ y = 1 \;\;\text{if} \;\;y^*>0,\qquad y = 0 \;\;\text{if}\;\; y^*\le 0$$
Then we ask "what is the probability that $y$ will take the value $1$ given the regressors?" (i.e. we are looking at a conditional probability). This is
$$P(y =1\mid X ) = P(y^*>0\mid X) = P(X\beta + u>0\mid X) = P(u> - X\beta\mid X) \\= 1- \Phi (-Χ\beta) = \Phi (X\beta) $$
the last equality due to the "reflective" property of the standard cumulative distribution function, which comes from the symmetry of the density function around zero. Note that although we have assumed that $u$ is independent of $X$, conditioning on $X$ is needed in order to treat the quantity $X\beta$ as non-random.
If we assume that $X\beta = b_0+b_1X_1 + b_2X_2$, then we obtain the theoretical model
$$P(y =1\mid X ) = \Phi (b_0+b_1X_1 + b_2X_2) \tag{1}$$
Let now $X_2$ be independent of $X_1$ and erroneously excluded from the specification of the underlying regression. So we specify
$$y^* = b_0+b_1X_1 + \epsilon$$ Assume further that $X_2$ is also a normal random variable $X_2 \sim N(\mu_2,\sigma_2^2)$. But this means that
$$\epsilon = u + b_2X_2 \sim N(b_2\mu_2, 1+b_2^2\sigma_2^2)$$
due to the closure-under-addition of the normal distribution (and the independence assumption). Applying the same logic as before, here we have
$$P(y =1\mid X_1 ) = P(y^*>0\mid X_1) = P(b_0+b_1X_1 + \epsilon>0\mid X_1) = P(\epsilon> - b_0-b_1X_1\mid X_1) $$
Standardizing the $\epsilon$ variable we have
$$P(y =1\mid X_1 )= 1- P\left(\frac{\epsilon-b_2\mu_2}{\sqrt {1+b_2^2\sigma_2^2}}\leq - \frac {(b_0 + b_2\mu_2)}{\sqrt {1+b_2^2\sigma_2^2}}- \frac {b_1}{\sqrt {1+b_2^2\sigma_2^2}}X_1\mid X_1\right)$$
$$\Rightarrow P(y =1\mid X_1) = \Phi\left(\frac {(b_0 + b_2\mu_2)}{\sqrt {1+b_2^2\sigma_2^2}}+ \frac {b_1}{\sqrt {1+b_2^2\sigma_2^2}}X_1\right) \tag{2}$$
and one can compare models $(1)$ and $(2)$.
The above theoretical expression, tells us where our maximum likelihood estimator of $b_1$ is going to converge, since it remains a consistent estimator, in the sense that it will converge to the theoretical quantity that really exists in the model (and of course, not in the sense that it will find the "truth" in any case):
$$\hat b_1 \xrightarrow{p} \frac {b_1}{\sqrt {1+b_2^2\sigma_2^2}} \implies |\hat b_1|< |b_1|$$
which is the "bias towards zero" result.
We used the probit model, and not the logit (logistic regression), because only under normality can we derive the distribution of $\epsilon$. The logistic distribution is not closed under addition. This means that if we omit a relevant variable in logistic regression, we also create distributional misspecification, because the error term (that now includes the omitted variable) no longer follows a logistic distribution. But this does not change the bias result (see footnote 6 in the paper linked to by the OP).
Best Answer
Usually, you would not care about both of them simultaneously. Depending on the goal of your analysis (say, description vs. prediction vs. causal inference), you would care about at most one of them.
Description$\color{red}{^*}$
Multicollinearity (MC) is just a fact to be mentioned, just one of the characteristics of the data to report.
The notion of omitted variable bias (OVB) does not apply to descriptive modelling. (See the definition of OVB in the Wikipedia quote provided below.) In contrast to causal modelling, the causal notion of relevance of variables does not apply for description. You can freely choose the variables you are interested in describing probabilistically (e.g. in the form of a regression) and you evaluate your model w.r.t. the chosen set of variables, not variables not chosen.
Prediction
MC and OVB are largely irrelevant as you are not interested in model coefficients per se, only in predictions.
Causal modelling / causal inference
You may care about both MC and OVB at once when attempting to do causal inference. I will argue that you should actually worry about the OVB but not MC. OVB results from a faulty model, not from the characteristics of the underlying phenomenon. You can remedy it by changing the model. Meanwhile, imperfect MC can very well arise in a well specified model as a characteristic of the underlying phenomenon. Given the well specified model and the data that you have, there is no sound escape from MC. In that sense you should just acknowledge it and the resulting uncertainty in your parameter estimates and inference.
$\color{red}{^*}$I am not 100% sure about the definition of description / descriptive modelling. In this answer, I take description to constitute probabilistic modelling of data, e.g. joint, conditional and marginal distributions and their specific features. In contrast to causal modelling, description focuses on probabilistic but not causal relationships between variables.
Edit to respond to feedback by @LSC:
In defence of my statement that OVB is largely irrelevant for prediction, let us first see what OVB is. According to Wikipedia,
In prediction, we do not care about the estimated effects but rather accurate predictions. Hence, my statement above should become obvious.
Regarding the statement OVB will necessarily introduce bias into the estimation process and can screw with predictions by @LSC.