Simpson's paradox is an extreme form of confounding where the apparent sign of correlation is reversed; you haven't said this is the position here.
I can see at least three possibilities here: the heterogenity between the subgroups, the reduction in sample sizes in each, and poor definition of the subgroups which presuppose the results. Ignoring the third, both of the first two can have an impact: from past experience it is often the small sample size which lead to non-significance in the smaller subgroup and heterogenity which causes the whole group to produce a significant result wile the large subgroup does not.
That was an over-generalisation - each case will have its own issues.
I think A and E aren't a good combination, because A says you should pick Mercy and E says you should pick Hope.
A and D have the virtue of advocating the same choice. But, lets examine the line of reasoning in D in further detail, since that seems to be the confusion. The probability of success for the surgeries follows the same ordering at both hospitals, with the A type being most likely to be successful and the E type being the least likely. If we collapse over (i.e., ignore) the hospitals, we can see that the marginal probability of success for the surgeries is:
Type A B C D E All
Prob .81 .78 .56 .21 .08 .52
Because E is much less likely to be successful, it is reasonable to imagine that it is more difficult (although in the real world, other possibilities exist as well). We can extend that line of thinking to the other four types also. Now lets look at what proportion of each hospital's total surgeries are of each type:
Type A B C D E
Mercy .08 .39 .06 .44 .03
Hope .09 .54 .23 .09 .05
What we notice here is that Hope tends to do more of the easier surgeries A-C (and especially B & C), and fewer of the harder surgeries like D. E is pretty uncommon in both hospitals, but, for what it's worth, Hope actually does a higher percentage. Nonetheless, the Simpson's Paradox effect is going to mostly be driven by B-D here (not actually column E as answer choice D suggested).
Simpson's Paradox occurs because the surgeries vary in difficulty (in general) and also because the N's differ. It is the differing base rates of the different types of surgeries that makes this counter-intuitive. What is happening would be easy to see if both hospitals did exactly the same number of each type of surgery. We can do that by simply calculating the success probabilities and multiplying by 100; this adjusts for the different frequencies:
Type A B C D E All
Mercy 81 79 60 21 09 250
Hope 80 76 51 14 04 225
Now, because both hospitals did 100 of each surgery (500 total), the answer is obvious: Mercy is the better hospital.
Best Answer
In Simpson's paradox there is a reversal of the sign of the correlation between two variables, or equivalently, a flip in the sign of regression coefficients (slopes) due to an unaccounted for confounding variable.
In the following illustration, we are looking at the relationship between a fictitious biochemical marker ("Marker X") in blood, and a second level in a blood test ("Marker Y"). We are pooling together the data from four different studies. Initially we look at a "successful" meta-analysis to eventually contrast it with a scenario where Simpson's paradox rears its head.
Scenario 1:
As the data is aggregated we end up with a situation best modeled as a mixed-model with random effects explaining the variability between different studies (unit effects) of the form $\boldsymbol{y} = X \boldsymbol{\beta} + Z \boldsymbol{u} + \boldsymbol{\epsilon}$ with $\boldsymbol u$ corresponding to the intecepts of the fitted lines through the $y\sim x$ data cloud for each individual study. This model accounts for the possible presence of a substantial amount of dispersion in the data. Here's what it would look like (code):
with the data from each study color coded on the plot to the left, and aggregated on the right. The situation is not ideal, because there is quite a bit of spread in the values from different datasets. The study would probably be much more likely to be published and quoted with something like this:
So perfect an overlap between studies that an ordinary least square (OLS) fit would probably be better than a mixed model with random effects.
However, in either instance, the presence of over-dispersion doesn't raise the possibility of a confounder variable.
On the other hand...
Scenario 2:
... we can encounter an often plotted distribution in the data cloud such that fitting a linear regression with mixed-effects ends up with negative (or positive) slopes for each one of the individual studies, only to reverse sign when the data is aggregated:
This is the Simpson's effect.
If we look into the results in the data behind these plots, the
cor(y,x)
for each individual subgroup ranged between-0.238
and-0.302
; yet, when the data was aggregated, the combined correlation was0.473
. Naturally the sign of the regression slopes was also opposite. Of note, the mixed-effect regression with random intersects was a better model than the OLS on the aggregate data.Already on the plot, one can have an intuition that the data is not just disperse, but it is stretched along the $x$ axis, precisely by an unknown "lurking" variable, which may not be immediately apparent. This can be objectivized by looking at the correlation between the $x$ variable and the $y\,\,intercept$ for each subgroup, and contrasting it with scenario 1. Graphically,
in the case with Simpson's effect (scenario 2), the higher the $x$ values, the higher the $y\,\,intercepts$ (left plot), as opposed to the lack of correlation in scenario 1 (right plot). The correlation in the first case between $x$ and the $y\,\,intercepts$ was
0.7876
with a statistically significant (p -> 0
) slope when doing a regression. In contradistinction, the correlation in scenario 1 between $x$ and the $y\,\,intercepts$ was0
.A follow-up question is what type of clinical scenario could possibly follow this pattern (scenario 2)?