Suppose that you are performing a linear regression examining the main effect $x_1$ and want to adjust for possible confounders $x_2, x_3, x_4$. Is it better to have an unadjusted model and a model adjusted for all potential confounders? Or should you also consider models adjusted for only some of the confounders (e.g. $x_2$, $x_2$ and $x_3$, etc.)?
Solved – Including confounders in a model
confoundingmodel selectionregression
Best Answer
I assume you're trying to estimate the causal effect of $x_1$ on $y$, rather than just trying to predict $y$. In general, to find a correct conditioning set (if one exists), you need to know the causal relationships among the variables. Here are some examples to illustrate why:
In this case, the true effect of $X_1$ on $Y$ is zero. However, there are two unobserved confounders $U_1$ and $U_2$, and one observed confounder $X_2$. If we had observed $U_1$ and $U_2$ we could condition on all three confounders and get an unbiased estimate. However, since we only observed $X_2$ we are stuck. If we don't control for $X_2$ it will confound our estimate of the effet of $X_1$ on $Y$. But if we do control for it, we induce an association between $U_1$ and $U_2$, and therefore between $X_1$ and $Y$.
There are many cases in which you should not control for a particular covariate. If you don't know the causal structure, you may accidentally bias your estimate by controlling for the wrong set of covariates. In this case you can apply a causal structure learning algorithm as a first step, before you try to estimate the causal effect of $x_1$ on $y$.