Causal Diagram – Adjusting for Variables Outside of Minimal Adjustment Set for Total Causal Effect in a DAG

causal-diagramdagmodel selectionregression

I have a fairly elaborate Directed Acyclic Graph (DAG) for the analysis that I am running, but I am presenting a simplified example here to clarify a few things.

Here is a DAG from dagitty.net:

enter image description here

  • According to the graph, I only need to adjust for A in order to close
    the back door path and to identify the total causal effect of
    Treatment on Outcome. In other words, the minimal adjustment set for
    this diagram is just A.

  • Conversely, if I were to condition on C, the pathway
    Treatment -> C -> Outcome would be biased because C is on the front door path between the Treatment and the
    Outcome, so C should be left out from a regression model OR else B would also need to be conditioned on to close the formed back door path.

My question is about variables like B, the adjustment for which is not strictly necessary (assuming C stays unadjusted for). Adjustment/conditioning on B or leaving out completely is seemingly inconsequential for the total causal effect of Treatment on Outcome. In this case, what are the implications, benefits or drawbacks of including B-type variables in my regression models? Would I not gain any precision or explanatory power in the model by including it as a control, rather than optionally leaving it out?

Best Answer

Conversely, if I were to condition on C, the pathway Treatment -> C -> Outcome would be biased because C is on the front door path between the Treatment and the Outcome, so C should be left out from a regression model OR else B would also need to be conditioned on to close the formed back door path.

I don't think this is right. Controlling for B would not solve all the problems you would introduce by controlling for C. While it would help you close the back-door path Treatment -> C <- B -> Outcome, the front-door path Treatment -> C -> Outcome would still be closed.

what are the implications, benefits or drawbacks of including B-type variables in my regression models? Would I not gain any precision or explanatory power in the model by including it as a control, rather than optionally leaving it out?

Yes, that's right. Controlling for A allows you to identify the total causal effect of Treatment on Outcome unbiasedly. Additionally controlling for B should yield a more precise estimator.

Related Question