Conditional Probability – Understanding the Difference Between Potential Outcome and Outcome Conditioned on Treatment

causalityconditional probability

I'm reading up on Rubin causal inference and haven't been able to find a clear distinction between a potential outcome and the outcome conditioned on treatment, although this distinction is supposed to be crucial.

Writing subject $i$'s response, $Y_i$, in terms of potential outcomes and treatment $X_i$ I see this definition in many sources:
$$Y_i = \begin{cases} Y_i(1), X_i = 1 \\ Y_i(0), X_i = 0 \end{cases} = Y_i(1)X_i + Y_i(0)(1 – X_i)$$

By definition, potential outcome $Y_i(1)$ is the outcome when subject $i$ receives treatment. So how exactly is this different from $Y_i | X_i = 1$, which would be subject $i$'s outcome given they received treatment?

Best Answer

See my post about this here. The gist is that potential outcomes are pretreatment variables that exist prior to treatment assignment. Receiving the treatment reveals one of the potential outcomes and leaves the other hidden. It is true that after assigning treatment and measuring the outcome, the outcome conditional on the treatment is equal to the potential outcome corresponding to the received treatment level, but you can talk about potential outcomes without even considering the treatment as a realized random variable, such as in defining the causal effect $E[Y(1)] - E[Y(0)]$, for which the treatment does not need to be realized or measured.

It is true that $[Y|X=x] = [Y(x)|X=x]$ for treatment $X$ (under consistency). This fact is used in the critical proofs for identifying causal effects under strong ignorability, i.e., the proof that $E_W[E[Y|X=1, W]] = E[Y(1)]$ for a sufficient adjustment set $W$ relies on the first step $$ E_W[E[Y|X=1, W]] = E_W[E[Y(1)|X=1, W]] $$ which is a direct result of consistency. But we can also talk about potential outcomes without considering them as the outcome conditional on treatment, like in the statement of strong ignorability, $$ Y(x) \perp X|W $$ which does not involve the outcomes at all and is a statement about the assignment mechanism. To say instead $Y \perp X|W,X$, that is, replacing $Y(x)$ with $Y|X$, makes the statement trivial when in fact it is a very strong assumption.

Related Question