Pearl, Causal Inference in Statistics Q3.5.1 (Backdoor criterion)

causal-diagramcausalitydag

This is a question about backdoor criterion (as per J. Pearl) on finding causal effects. It is linked to a specific exercise in a specific book, but I hope it will be sufficiently generic and self-contained to be of general use.

Problem statement

I am self-studying Pearl, Glymour, Jewell Causal Inference in Statistics, A Primer. Not quite sure about Q3.5.1 b. There we are given a causal diagram

enter image description here

And asked to find z-specific effect of X on Y, i.e.:

$$
P\left[Y=y\, \Big|\,do\left(X=x\right),\,Z=z\right]
$$

As soon as we condition on $Z$, we are creating constraint that correlates $B$ and $C$ and thus opens a back-door $XABZCDY$. To estimate effect of $X$ on $Y$, that backdoor path needs to be broken. In my understanding, this can be done by conditioning on any of the $A$, $B$, $C$ or $D$.

Model solution

I also have model solutions (found online). There, only one option is mentioned – to condition on $C$:

$$
P\left[Y=y\, \Big|\,do\left(X=x\right),\,Z=z\right]=\sum_{c} P\left[Y=y\, \Big|\,X=x,\,Z=z,\,C=c\right]\cdot P\left[C=c\right]
$$

Attempt to explain the model solution

Is there a reason why conditioning on $C$ is given as a sole solution? I can rule out conditioning on $A$ or $D$ since those are not independent variables. That leaves a question of whether I could condition on $B$.

One way I can think of explaining why conditioning on $B$ would not work is by noting that causal effect corresponds to conditional probability on a modified diagram:

enter image description here

$$
P\left[Y=y\, \Big|\,do\left(X=x\right),\,Z=z\right]=P_m\left[Y=y\, \Big|\,X=x,\,Z=z\right]
$$

Now, I can express this as:

$$
\begin{align}
P_m\left[Y=y\, \Big|\,X=x,\,Z=z\right] &= \sum_{b} P_m\left[Y=y\, \Big|\,X=x,\,Z=z,\,B=b\right]\cdot P_m\left[B=b\right] \\
&=\sum_{c} P_m\left[Y=y\, \Big|\,X=x,\,Z=z,\,C=c\right]\cdot P_m\left[C=c\right]
\end{align}
$$

Since $B$ is independent, its probability would not be affected by modification of the diagram, so $P_m\left[B=b\right]=P\left[B=b\right]$, and same for $C$.

When it comes to conditional probability, we can use the fact that with fixed $Z=z$ there is no causal link from $C$ to $X$, thus conditioning on $C$ is the same on both the original and the modified diagram: $P_m\left[Y=y\, \Big|\,X=x,\,Z=z,\,C=c\right]=P\left[Y=y\, \Big|\,X=x,\,Z=z,\,C=c\right]$. This logic would not work for $B$ since $B$ does affect $X$ on the original diagram. Therefore we can only condition on $C$:

\begin{align}
P\left[Y=y\, \Big|\,do\left(X=x\right),\,Z=z\right]&=P_m\left[Y=y\, \Big|\,X=x,\,Z=z\right] \\
&=\sum_{c} P_m\left[Y=y\, \Big|\,X=x,\,Z=z,\,C=c\right]\cdot P_m\left[C=c\right] \\
&=\sum_{c} P\left[Y=y\, \Big|\,X=x,\,Z=z,\,C=c\right]\cdot P\left[C=c\right]
\end{align}

Does this make sense?

Best Answer

No you were right to begin with, you can control for any variable along the back door path so long as it doesn’t open up new such paths.

You can try it for yourself for the specific diagram here (set Z to adjusted and some other one to see only the causal path remain colored): http://dagitty.net/dags.html?id=331

Related Question