Causality – Difference Between Omitted Variable Bias and Confounding

causalityconfoundingomitted-variable-bias

Is there a difference between omitted variable bias and confounding bias in linear models?

To my knowledge, when investigating the causal effect of $X$ on $Y$, a confounder is a variable $Z$ that is causally related to both $X$ and $Y$ with a corresponding dag: $Z\rightarrow X\rightarrow Y \leftarrow Z $

But why does the OMV, commonly derived as $\hat{\beta} = \beta + γ\cdot cov(X,Z)/var(X) = \beta + γ\kappa$, consist of the effect γ of the regression of $Y$ on $Z,X$ and the effect $\kappa$ of $Z$ on $X$ instead of $X$ on $Z$.

Edit, spelling out the classic OMV example
and cleaning notation/correcting mistakes:

$Y =\beta X+γZ + ε \\
Y = \hat{\beta}X+ \hat{\epsilon}$

Best Answer

Omitted variable bias (OVB) is agnostic to the causal relationship between $X$ and $Z$. It concerns only the ability to estimate $\tau$ in the structural model for $Y$. The joint distribution of $Y$, $X$, and $Z$ is compatible both with a data-generating process in which $Z$ is a confounder of the $X \rightarrow Y$ relationship, so that $\tau$ represents the total effect of $X$ on $Y$, and with a data-generating process in which $Z$ is a mediator of the $X \rightarrow Y$ relationship, so that $\tau$ represents the direct effect of $X$ on $Y$.

In the confounding model, the data-generating process for $X$ and $Z$ is: $$ Z := \epsilon_Z \\ X := \gamma Z + \epsilon_X $$ In the mediation model, the data-genertaing process for $X$ and $Z$ is: $$ Z := \alpha X + \epsilon_Z \\ X := \epsilon_X $$

For the confounding process, omitting $Z$ from the model for $Y$ yields a biased estimate of $\tau$, the total effect of $X$ on $Y$. Thisis the classic bias due to an omitted confounder.

For the mediation process, the $X \rightarrow Y$ relationship is not confounded. The estimated coefficient $\hat \tau$ in the model omitting $Z$ is unbiased for the total causal effect of $X$ on $Y$. However, it is biased for $\tau$, the direct effect of $X$ on $Y$.

This is all to say that it's possible to have OVB without confounding if the coefficient you are trying to estimate is a direct effect, in which case omitting the mediator yields a biased estimate of this quantity. In the absence of confounding, the model omitting the mediator yields the total effect. The formula for the bias is the same regardless of the data-generating process of $X$ and $Z$, but the interpretation of the biased parameter depends on the causal relationship between $X$ and $Z$.