In a DAG, why does a randomized controlled trial ensure there are no backdoor paths from treatment to response and hence no omitted variable bias?
Clinical Trials – Understanding Randomized Controlled Trials and Directed Acyclic Graphs (DAG)
clinical-trialsdag
Related Solutions
When you look closely, there is no causal path from T to Growth, but there are two backdoor paths, both going through UC:
- one directly from T over UC to Growth
- and one from T over UC to dT to Growth
So, if indeed UC is unobserved in your study, there is no way to get an unbiased causal estimate for the effect of T on Growth from the data, given this DAG.
That said, adjusting for dT will close one of the two backdoor paths (the second one) and, since dT serves a proxy (or surrogate) variable for UC, it will also partially close the other backdoor path. (More precisely, only variation in UC that is unique to UC and not shared with dT will remain as confounding influence.)
More generally, this problem seems a bit ill-defined: A DAG encodes your structural assumptions about the data-generating process. If you assume no (direct or indirect) causal effect of T on Growth, as is done in the DAG, the question of how to estimate that effect, given the DAG, becomes a bit nonsensical. It's zero by definition.
Regarding your question
For Multiple Linear Regression on the covariates, should all the covariates be taken into the account or only those in the adjustment set for the given exposure and outcome?
If you aim for a specific causal estimand and choose multiple regression as your estimation technique, then including the adjustment set in the model is the absolutely necessary minimum.
There can be benefits from including further variables which are not in the adjustment set (for example, you typically gain precision by including predictors of the outcome, even though they are not confounders) and there can be harms (for example, you might re-introduce bias by openining a previously closed path). This paper gives an excellent overview aobut these situations and the question what to adjust and what not to adjuts for.
In a randomized trial, the treatment is randomized, so there is no confounding of the treatment-outcome relationship or the treatment-mediator relationship. For those relationships, associations represent causal effects.
However, the mediator is not randomized. It is in part affected by the treatment, and in part effected by a collection of many other factors. If any of those factors also affect the outcome, you have confounding of the mediator-outcome relationship (sometimes called the "b" path). You need to adjust for this confounding in order to validly estimate the causal effect of the mediator on the outcome.
The confounding of the mediator-outcome relationship is the primary reason people are often skeptical of mediation analysis. For many mediators, it is impossible to observe enough variables to eliminate mediator-outcome confounding (i.e., to collect all common causes of the mediator and outcome other than the treatment).
In a randomized trial, you can estimate the causal effect of the treatment on the outcome (the total effect) and the causal effect of the treatment on the mediator (the "a" path), but you cannot estimate the causal effect of the mediator on the outcome (the "b" path) without additional work to remove confounding. A consequence of this is that you cannot estimate either the indirect effect or the direct effect; only the total effect is available without further adjustment.
Even if you do collect confounders of the mediator-outcome relationship, it is not straightforward to adjust for them if they are also caused by the treatment, which is yet anther difficulty with mediation analysis. Given these difficulties, mediation analysis should be used very sparingly and only in cases where we understand the confounding mechanism well and can adjust for it.
Note that none of is is "by chance"; everything I said above would be true even if we were able to run a randomized trial in the entire population (i.e., or an infinite sample size).
Best Answer
Quite simply an RCT ensures no backdoor paths (technically it reduces the possibility of backdoor confounding to a chance which is inversely related to sample size) from outcome $Y$ to treatment $A$, because by definition random assignment $R$ is the only prior cause of treatment:
$$\boxed{R} \to A \to Y$$
In the simple DAG above, randomization is the only cause of treatment. If there were a backdoor path through some third variable like disease severity, or smoking history, then randomization would not actually assign treatment.
In this DAG notation, the box around $R$ (the randomizing process) indicates that it has no prior cause (i.e. it is a purely probabilistic phenomenon).
This specific fact about random assignment—that it reduces the role of confounding via a backdoor path to chance—is rather the entire point of random assignment to treatment.