Solved – Difference between rungs two and three in the Ladder of Causation

causality

In Judea Pearl's "Book of Why" he talks about what he calls the Ladder of Causation, which is essentially a hierarchy comprised of different levels of causal reasoning. The lowest is concerned with patterns of association in observed data (e.g., correlation, conditional probability, etc.), the next focuses on intervention (what happens if we deliberately change the data generating process in some prespecified way?), and the third is counterfactual (what would happen in another possible world if something had or had not happened)?

What I'm not understanding is how rungs two and three differ. If we ask a counterfactual question, are we not simply asking a question about intervening so as to negate some aspect of the observed world?

Best Answer

There is no contradiction between the factual world and the action of interest in the interventional level. For example, smoking until today and being forced to quit smoking starting tomorrow are not in contradiction with each other, even though you could say one “negates” the other. But now imagine the following scenario. You know Joe, a lifetime smoker who has lung cancer, and you wonder: what if Joe had not smoked for thirty years, would he be healthy today? In this case we are dealing with the same person, in the same time, imagining a scenario where action and outcome are in direct contradiction with known facts.

Thus, the main difference of interventions and counterfactuals is that, whereas in interventions you are asking what will happen on average if you perform an action, in counterfactuals you are asking what would have happened had you taken a different course of action in a specific situation, given that you have information about what actually happened. Note that, since you already know what happened in the actual world, you need to update your information about the past in light of the evidence you have observed.

These two types of queries are mathematically distinct because they require different levels of information to be answered (counterfactuals need more information to be answered) and even more elaborate language to be articulated!.

With the information needed to answer Rung 3 questions you can answer Rung 2 questions, but not the other way around. More precisely, you cannot answer counterfactual questions with just interventional information. Examples where the clash of interventions and counterfactuals happens were already given here in CV, see this post and this post. However, for the sake of completeness, I will include an example here as well.

The example below can be found in Causality, section 1.4.4.

Consider that you have performed a randomized experiment where patients were randomly assigned (50% / 50%) to treatment ($x =1$) and control conditions ($x=0$), and in both treatment and control groups 50% recovered ($y=0$) and 50% died ($y=1$). That is $P(y|x) = 0.5~~~\forall x,y$.

The result of the experiment tells you that the average causal effect of the intervention is zero. This is a rung 2 question, $P(Y = 1|do(X = 1)) - P(Y=1|do(X =0) = 0$.

But now let us ask the following question: what percentage of those patients who died under treatment would have recovered had they not taken the treatment? Mathematically, you want to compute $P(Y_{0} = 0|X =1, Y = 1)$.

This question cannot be answered just with the interventional data you have. The proof is simple: I can create two different causal models that will have the same interventional distributions, yet different counterfactual distributions. The two are provided below:

enter image description here

Here, $U$ amounts to unobserved factors that explain how the patient reacts to the treatment. You can think of factors that explain treatment heterogeneity, for instance. Note the marginal distribution $P(y, x)$ of both models agree.

Note that, in the first model, no one is affected by the treatment, thus the percentage of those patients who died under treatment that would have recovered had they not taken the treatment is zero.

However, in the second model, every patient is affected by the treatment, and we have a mixture of two populations in which the average causal effect turns out to be zero. In this example, the counterfactual quantity now goes to 100% --- in Model 2, all patients who died under treatment would have recovered had they not taken the treatment.

Thus, there's a clear distinction of rung 2 and rung 3. As the example shows, you can't answer counterfactual questions with just information and assumptions about interventions. This is made clear with the three steps for computing a counterfactual:

  1. Step 1 (abduction): update the probability of unobserved factors $P(u)$ in light of the observed evidence $P(u|e)$
  2. Step 2 (action): perform the action in the model (for instance $do(x))$.
  3. Step 3 (prediction): predict $Y$ in the modified model.

This will not be possible to compute without some functional information about the causal model, or without some information about latent variables.

Related Question