I have a fairly elaborate Directed Acyclic Graph (DAG) for the analysis that I am running, but I am presenting a simplified example here to clarify a few things.
Here is a DAG from dagitty.net:
-
According to the graph, I only need to adjust for A in order to close
the back door path and to identify the total causal effect of
Treatment on Outcome. In other words, the minimal adjustment set for
this diagram is just A. -
Conversely, if I were to condition on C, the pathway
Treatment -> C -> Outcome would be biased because C is on the front door path between the Treatment and the
Outcome, so C should be left out from a regression model OR else B would also need to be conditioned on to close the formed back door path.
My question is about variables like B, the adjustment for which is not strictly necessary (assuming C stays unadjusted for). Adjustment/conditioning on B or leaving out completely is seemingly inconsequential for the total causal effect of Treatment on Outcome. In this case, what are the implications, benefits or drawbacks of including B-type variables in my regression models? Would I not gain any precision or explanatory power in the model by including it as a control, rather than optionally leaving it out?
Best Answer
I don't think this is right. Controlling for
B
would not solve all the problems you would introduce by controlling forC
. While it would help you close the back-door pathTreatment -> C <- B -> Outcome
, the front-door pathTreatment -> C -> Outcome
would still be closed.Yes, that's right. Controlling for
A
allows you to identify the total causal effect of Treatment on Outcome unbiasedly. Additionally controlling forB
should yield a more precise estimator.