Regression – Dealing with Unmeasured Confounders and Highly Correlated Controls

causalityconfoundinglogisticmatchingregression

I am developing two models using observational data in which I have a binary outcome, a binary treatment and a series of confounders which I control for. The only difference is that the first model uses matched data (via exact matching), while the second uses the whole sample, with no matching.

Now, there's also a variable (to which I do not have access) that some colleagues point out might be a confounder. I am not convinced it is one, at least if we define confounders in line with Pearl and Cinelli's tradition (see Confounding variables in experimental study), namely "variables that affect both the treatment and the outcome". If we relax the definition and define confounders as those variables that correlate with treatment and outcome, instead, I acknowledge the unmeasured variable I was referring to can be considered a confounder.

I obviously know that – if this really is a confounder – my estimated effects turn out to be biased. Yet, among the other variables I control for, there are a couple that I expect would be highly correlated with this unmeasured variable. May I say that this somehow significantly reduces the issue of excluding this infamous unmeasured variable?

Also, to give a little more context, my outcome maps whether a person recovered from a given condition (this is just an example of my actual problem, so please do not focus on the theoretically relevant aspects of the health problem) and I know that the condition itself is highly clustered in space (aka neighborhoods) and the type of neighborhood is the unmeasured variable I was referring to.

Given that there is somehow almost perfect overlap between the "antecedent" of the outcome (the health condition) and the unmeasured variable (the neighborhood), and given that I have other variables that are highly correlated with the unmeasured neighborhood information, and assuming that a confounder implies correlation and not causation, do you think the estimates of the model can be considered reliable, while certainly pointing out the limitation? How would you proceed?

Thanks for your help!

Best Answer

Although it is true that confounding is due to common causes of the treatment and outcome, a confounder does not have to cause both the treatment and the outcome. It needs to lie along an open backdoor path from the treatment to the outcome. See my answer here for a more precise definition of a confounder. Consider the following DAG (made at http://www.dagitty.net/):

A DAG displaying confounding with extra variables included

and in particular the chain $$ A \leftarrow X_1 \leftarrow X_2 \rightarrow X_3 \rightarrow Y $$

where $A$ is the treatment and $Y$ is the outcome.

It's true that confounding is due to $X_2$, but adjusting for any one of $X_1$, $X_2$, or $X_3$ alone would be enough to remove the confounding, which means all three of them are confounders. $X_1$ causes the treatment but is only correlated with (and does not cause) the outcome. $X_3$ causes the outcome but is only correlated with (and does not cause) the treatment. Omitting all three of these would allow confounding to remain.

Consider also the additional causal path, $X_4 \rightarrow X_2$, which is in addition to the chain mentioned above. $X_4$ appears to satisfy many of the supposed qualities of a confounder: it is a cause of the treatment (and correlated with treatment) and it is a cause of the outcome (and correlated with the outcome). And yet it is not a confounder because adjusting for it does not remove confounding (i.e., it is not a part of any minimally sufficient adjustment set).

All this is to say is that knowing whether a variable causes the treatment and/or outcome or is just correlated with the treatment and/or outcome is not enough to decide whether something is a confounder and whether it needs to be adjusted for. $X_1$ and $X_3$ are confounders, even though $X_1$ doesn't cause the outcome and $X_3$ doesn't cause the treatment. $X_4$ is not a confounder, even though it causes both the treatment and the outcome. The only way to know whether failing to adjust for a given variable will induce bias due to confounding is to make a DAG and use the DAG adjustment rules to determine whether a sufficient adjustment set omits that variable.

Related Question