Simpson's paradox is an extreme form of confounding where the apparent sign of correlation is reversed; you haven't said this is the position here.
I can see at least three possibilities here: the heterogenity between the subgroups, the reduction in sample sizes in each, and poor definition of the subgroups which presuppose the results. Ignoring the third, both of the first two can have an impact: from past experience it is often the small sample size which lead to non-significance in the smaller subgroup and heterogenity which causes the whole group to produce a significant result wile the large subgroup does not.
That was an over-generalisation - each case will have its own issues.
I think A and E aren't a good combination, because A says you should pick Mercy and E says you should pick Hope.
A and D have the virtue of advocating the same choice. But, lets examine the line of reasoning in D in further detail, since that seems to be the confusion. The probability of success for the surgeries follows the same ordering at both hospitals, with the A type being most likely to be successful and the E type being the least likely. If we collapse over (i.e., ignore) the hospitals, we can see that the marginal probability of success for the surgeries is:
Type A B C D E All
Prob .81 .78 .56 .21 .08 .52
Because E is much less likely to be successful, it is reasonable to imagine that it is more difficult (although in the real world, other possibilities exist as well). We can extend that line of thinking to the other four types also. Now lets look at what proportion of each hospital's total surgeries are of each type:
Type A B C D E
Mercy .08 .39 .06 .44 .03
Hope .09 .54 .23 .09 .05
What we notice here is that Hope tends to do more of the easier surgeries A-C (and especially B & C), and fewer of the harder surgeries like D. E is pretty uncommon in both hospitals, but, for what it's worth, Hope actually does a higher percentage. Nonetheless, the Simpson's Paradox effect is going to mostly be driven by B-D here (not actually column E as answer choice D suggested).
Simpson's Paradox occurs because the surgeries vary in difficulty (in general) and also because the N's differ. It is the differing base rates of the different types of surgeries that makes this counter-intuitive. What is happening would be easy to see if both hospitals did exactly the same number of each type of surgery. We can do that by simply calculating the success probabilities and multiplying by 100; this adjusts for the different frequencies:
Type A B C D E All
Mercy 81 79 60 21 09 250
Hope 80 76 51 14 04 225
Now, because both hospitals did 100 of each surgery (500 total), the answer is obvious: Mercy is the better hospital.
Best Answer
For the case in which all patient descriptors are in the correct part of a causal diagram, a necessary but not sufficient condition for which is that the descriptors are assessed at "time zero" or before, Simpson's "paradox" is nothing more than a failure to ask a specific enough question. Stay away from marginal treatment effects and instead condition on all available information that is consistent with causal pathways. In the case of age and sex it is seldom inappropriate to condition on them. Treatment effects should be conditional and respect information flow. Focus on making the best treatment decision for the one patient being treated.