Solved – How to resolve Simpson’s paradox

simpsons-paradox

Simpson's paradox is a classic puzzle discussed in introductory statistics courses worldwide. However, my course was content to simply note that a problem existed and did not provide a solution. I would like to know how to resolve the paradox. That is, when confronted with a Simpson's paradox, where two different choices seem to compete for the being the best choice depending on how the data is partitioned, which choice should one choose?

To make the problem concrete, let's consider the first example given in the relevant Wikipedia article. It is based on a real study about a treatment for kidney stones.

enter image description here

Suppose I am a doctor and a test reveals that a patient has kidney stones. Using only the information provided in the table, I would like to determine whether I should adopt treatment A or treatment B. It seems that if I know the size of the stone, then we should prefer treatment A. But if we do not, then we should prefer treatment B.

But consider another plausible way to arrive at an answer. If the stone is large, we should choose A, and if it is small, we should again choose A. So even if we do not know the size of the stone, by the method of cases, we see that we should prefer A. This contradicts our earlier reasoning.

So: A patient walks into my office. A test reveals they have kidney stones but gives me no information about their size. Which treatment do I recommend? Is there any accepted resolution to this problem?

Wikipedia hints at a resolution using "causal Bayesian networks" and a "back-door" test, but I have no clue what these are.

Best Answer

In your question, you state that you don't know what "causal Bayesian networks" and "back door tests" are.

Suppose you have a causal Bayesian network. That is, a directed acyclic graph whose nodes represent propositions and whose directed edges represent potential causal relationships. You may have many such networks for each of your hypotheses. There are three ways to make a compelling argument about the strength or existence of an edge $A \stackrel?\rightarrow B$.

The easiest way is an intervention. This is what the other answers are suggesting when they say that "proper randomization" will fix the problem. You randomly force $A$ to have different values and you measure $B$. If you can do that, you're done, but you can't always do that. In your example, it may be unethical to give people ineffective treatments to deadly diseases, or they may be have some say in their treatment, e.g., they may choose the less harsh (treatment B) when their kidney stones are small and less painful.

The second way is the front door method. You want to show that $A$ acts on $B$ via $C$, i.e., $A\rightarrow C \rightarrow B$. If you assume that $C$ is potentially caused by $A$ but has no other causes, and you can measure that $C$ is correlated with $A$, and $B$ is correlated with $C$, then you can conclude evidence must be flowing via $C$. The original example: $A$ is smoking, $B$ is cancer, $C$ is tar accumulation. Tar can only come from smoking, and it correlates with both smoking and cancer. Therefore, smoking causes cancer via tar (though there could be other causal paths that mitigate this effect).

The third way is the back door method. You want to show that $A$ and $B$ aren't correlated because of a "back door", e.g. common cause, i.e., $A \leftarrow D \rightarrow B$. Since you have assumed a causal model, you merely need to block the all of the paths (by observing variables and conditioning on them) that evidence can flow up from $A$ and down to $B$. It's a bit tricky to block these paths, but Pearl gives a clear algorithm that lets you know which variables you have to observe to block these paths.

gung is right that with good randomization, confounders won't matter. Since we're assuming that intervening at the the hypothetical cause (treatment) is not allowed, any common cause between the hypothetical cause (treatment) and effect (survival), such as age or kidney stone size will be a confounder. The solution is to take the right measurements to block all of the back doors. For further reading see:

Pearl, Judea. "Causal diagrams for empirical research." Biometrika 82.4 (1995): 669-688.

To apply this to your problem, let us first draw the causal graph. (Treatment-preceding) kidney stone size $X$ and treatment type $Y$ are both causes of success $Z$. $X$ may be a cause of $Y$ if other doctors are assigning tratment based on kidney stone size. Clearly there are no other causal relationships between $X$,$Y$, and $Z$. $Y$ comes after $X$ so it cannot be its cause. Similarly $Z$ comes after $X$ and $Y$.

Since $X$ is a common cause, it should be measured. It is up to the experimenter to determine the universe of variables and potential causal relationships. For every experiment, the experimenter measures the necessary "back door variables" and then calculates the marginal probability distribution of treatment success for each configuration of variables. For a new patient, you measure the variables and follow the treatment indicated by the marginal distribution. If you can't measure everything or you don't have a lot of data but know something about the architecture of the relationships, you can do "belief propagation" (Bayesian inference) on the network.

Related Solutions

Solved – Simpson’s paradox or confounding

Simpson's paradox is an extreme form of confounding where the apparent sign of correlation is reversed; you haven't said this is the position here.

I can see at least three possibilities here: the heterogenity between the subgroups, the reduction in sample sizes in each, and poor definition of the subgroups which presuppose the results. Ignoring the third, both of the first two can have an impact: from past experience it is often the small sample size which lead to non-significance in the smaller subgroup and heterogenity which causes the whole group to produce a significant result wile the large subgroup does not.

That was an over-generalisation - each case will have its own issues.

Solved – Basic Simpson’s paradox

I think A and E aren't a good combination, because A says you should pick Mercy and E says you should pick Hope.

A and D have the virtue of advocating the same choice. But, lets examine the line of reasoning in D in further detail, since that seems to be the confusion. The probability of success for the surgeries follows the same ordering at both hospitals, with the A type being most likely to be successful and the E type being the least likely. If we collapse over (i.e., ignore) the hospitals, we can see that the marginal probability of success for the surgeries is:

Type     A     B     C     D     E     All  
Prob   .81   .78   .56   .21   .08     .52

Because E is much less likely to be successful, it is reasonable to imagine that it is more difficult (although in the real world, other possibilities exist as well). We can extend that line of thinking to the other four types also. Now lets look at what proportion of each hospital's total surgeries are of each type:

Type     A     B     C     D     E  
Mercy  .08   .39   .06   .44   .03  
Hope   .09   .54   .23   .09   .05

What we notice here is that Hope tends to do more of the easier surgeries A-C (and especially B & C), and fewer of the harder surgeries like D. E is pretty uncommon in both hospitals, but, for what it's worth, Hope actually does a higher percentage. Nonetheless, the Simpson's Paradox effect is going to mostly be driven by B-D here (not actually column E as answer choice D suggested).

Simpson's Paradox occurs because the surgeries vary in difficulty (in general) and also because the N's differ. It is the differing base rates of the different types of surgeries that makes this counter-intuitive. What is happening would be easy to see if both hospitals did exactly the same number of each type of surgery. We can do that by simply calculating the success probabilities and multiplying by 100; this adjusts for the different frequencies:

Type     A     B     C     D     E     All  
Mercy   81    79    60    21    09     250  
Hope    80    76    51    14    04     225

Now, because both hospitals did 100 of each surgery (500 total), the answer is obvious: Mercy is the better hospital.

Best Answer

Related Solutions

Solved – Simpson’s paradox or confounding

Solved – Basic Simpson’s paradox

Related Question