Regression – Adjusting for Path Variables and Mediators in Multiple Regression

causalityconfoundingmediationmultiple regressionregression

I’ve been taught that one of the criteria for being a “confounding” variable is that it should not be a path variable / mediator (ie. not a descendant of, or associated with, the exposure, and is independently associated with the outcome). Adjusting for such a variable would introduce bias.

My question is, are there circumstances where adjusting for these variables be meaningful?

My thought process is that adjusting for a path variable will “take away” from the effect of the exposure. Surely then, doing so would shed light on how much of the exposure’s effect is direct and how much is via the path variable.

Best Answer

Your intuition is correct, although of course in reality, things are a bit more complex.

Suppose you think the causal graph looks like this: enter image description here

Then you can estimate the average causal effect of $D$ on $Y$ using various methods like regression or matching where you additionally control for $X$, which is a confounder. Technically speaking, this works because $X$ satisfies the back-door criterion: It blocks all "bad paths", blocks no causal paths from $D$ to $Y$, and opens up no new bad paths.

Obviosuly, adjusting for $X$ and $M$ would not give you the causal effect of $D$ on $Y$, because you block the $D \rightarrow M \rightarrow Y$ path. However, in this case, a regression of $Y$ on $D, M, X$ would still give you a reasonable estimate of the controlled direct effect of $D$ when you fix $M$ at some value.

But suppose the world looks like this, where $U$ is unobserved: enter image description here

In this case, a simple regression of $Y$ on $D$ gives you the causal effect of $D$. What happens when you adjust for $M$? You block the $D \rightarrow M \rightarrow Y$ paths. However, using elementary rules of graphs (d-separation), you also open up the path $D \rightarrow M \leftarrow U \rightarrow Y$. This means that conditional on $M$, $D$ and $Y$ will be correlated. But then this regression is misleading, because the controlled direct effect of $D$ that goes not through $M$ is clearly 0! Unfortunately, unless you measure $U$, there is nothing you can do to find this direct effect, except for randomizing $D$ and $M$.

This is one common and under-appreciated problem in the analysis of mediation. For more, see for example Pearl, Judea. "Interpretation and identification of causal mediation." Psychological methods 19.4 (2014): 459.

Related Question