Solved – Intuition needed when using weighted average to explain Simpson’s paradox

simpsons-paradox

In Freedman's Statistics (chapter 2), the author uses Berkeley's admission statistics (that 44% men and 35% women were admitted to graduate programs in general) to illustrate Simpson's paradox: the difference in overall admission rate per gender could be explained by different application rates to different majors between males and females — "The men were applying to the easy majors, the women to the harder ones", as stated by the author.

To remedy this, the author suggests using a weighted average of the admission rates for each gender. The formula and explanation for the weighted averages are below:

My rough understanding is: assuming all applicants have the same major-specific admission rates as that of males, the overall rate calculated from this will be the male's weighted average admission rate, and vice versa for women.

However, it's still not intuitive to me yet (both mathematically and conceptually) why these equations would give a more "balanced" admission rates for each gender.

For example, when one calculates the overall admission rate for each gender the "old" way, doesn't one already weighted this average with the (male) application volume? Granted, the new volume use the overall (not just male) application volume per major as the weights, but I still don't know why this method would solve this paradox.

If anyone could help explain to me these formulas, or point me to concepts I need to learn to understand these formulas, I'd really appreciate it. I have a hunch it might be something related to multivariate statistics (since there seems to be 2 factors in this case: gender and major), but being a new statistics student I may not be aware of them yet. My ultimate goal is to understand why and how these formulas work, so that in the future I know when to use them.

Best Answer

Thank you @AlexeyBurnakov for your answer; it really motivated me to find an explanation for the paradox using the concept of weights. After spending a few hours reading up on this (though with limited results as most of the material is beyond my level), my understanding of this is presented below:

For simplicity, I've modified the original problem to a simpler one with only 2 majors, but still retains the paradox: for each major (A & B), the admit rate for females are higher than that of males. However, overall, females are admitted at much lower rate (11%) than males (40%)!

|         |    Male    |        |            |  Female   |        |            |
|---------|------------|--------|------------|-----------|--------|------------|
|         | Applicants | Admits | Admit rate | Aplicants | Admits | Admit rate |
| Major A | 560        | 353    | 63%        | 25        | 17     | 68%        |
| Major B | 373        | 22     | 6%         | 341       | 24     | 7%         |
| Total   | 933        | 375    | 40%        | 366       | 41     | 11%        |

Let's calculate these overall admit rates the "old" way:

Males: $$ \dfrac{63\%\ *\ 560\ +\ 6\%\ *\ 373}{933} \ =\ 40\%\ \ \ \ \ \ \ (1) $$ That's the same as: $$ 63\%\ *\ \dfrac{560}{933} \ +\ 6\%\ *\ \dfrac{373}{933} =40\% \ \ \ \ \ \ (2) $$ , or $$ 63\%\ *\ 60\%\ +\ 6\%\ *\ 40\%\ =\ 40\% \ \ \ \ \ \ (3) $$ In short, the "old" way of calculating the overall admission rate for male is: $$ \sum P( males\ admitted\ for\ each\ major) \ *\ P( males\ applying\ to\ each\ major) $$

Applying this formula to females will give the "old" way of calculating the overall admission rate for females: $$ 68\%\ *\ 7\%\ +\ 7\%\ *\ 93\%\ =\ 11\% \ \ \ \ \ \ (4) $$ Comparing equation (3) and (4), we can clearly see the reason for the paradox are the "weights" associated with male and female admission rates for each major. More specifically, these "weights" are the probability (or rather propensity) for males and females applying to a certain major (60% major A and 40% major B for males, 7% major A and 93% major B for females).

In other words, the very high probability of females applying to the hard major B (93%), whose admission rate is only 7%, means the overall female admission rate will be "weighted down" towards that 7%, hence the overall rate of only 11%. This matches with the author's explanation in the book, while also uses the concept of "weights" to explain the paradox. The "old" overall averages are already weighted, but the weights are not quite fair as they differ between males and females.

So how can we make the weights fairer? The author suggests using the same weights for both males and females. But what kind of same weights should be use? The author suggest using the (gender-agnostic) probability that an applicant applies to a certain major, instead of using different probabilities of males or females applying to that major separately (as seen in equations 3 & 4).

With these new weights, the overall male admission rate becomes (compare this with equation 2 to see the difference): $$ 63\%\ *\ \dfrac{560+25}{933\ +\ 366} +6\%\ *\ \dfrac{373\ +\ 341}{933\ +\ 366} \ =\ 31.6\% \ \ \ \ \ \ (5) $$ , or $$ 63\%\ *\ 45\%+6\%\ *\ 55\%\ =\ 31.6\% \ \ \ \ \ \ (6) $$ , while the "new" overall female admission rate becomes: $$ 68\%\ *\ 45\%+7\%\ *\ 55\%\ =\ 34.5\% \ \ \ \ \ \ (7) $$ Therefore, with the new weights that are the the same across genders, one can see that females are in fact not underrepresented as one might think from the "old" overall admission rates.

However, I'm still not sure why the gender-agnostic application rates would provide "fairer" overall rates. Intuitively, it kind of makes sense to me: $\dfrac{560+25}{933\ +\ 366}$ is between $\dfrac{560}{933}$ (the high male application rate for the easy major A) and $\dfrac{25}{366}$ (the low female application rate for that major)

However, perhaps there are mathematical derivations for that. After all, any constant set of weights would lead to the more accurate results (that females are not underrepresented), so if anyone has further explanations on why we should use that set of weights -- and not any other -- I'd love to hear it!

PS. According to this paper by Westbrooke, it's better to forgo weighted averages, but instead use the actual and expected number of admitted females to represent the data (see table below). From this representation, we can see that the actual females admitted (41) is larger than the expected females admitted (36), hence reaching the same conclusion in Freedman that females are in fact not underrepresented in graduate admission.

|         | Female     | Male       | Female     | Actual            | Expected           |
|         | admit rate | admit rate | applicants | females admitted* | females admitted** |
| Major A | 68%        | 63%        | 25         | 17                | 16                 |
| Major B | 7%         | 6%         | 341        | 24                | 20                 |
| Overall |            |            | 366        | 41                | 36                 |

* using female admission rate for that major (e.g. 68% * 25)

** using male admission rate for that major (e.g. 63% * 25, to verify if females are as represented as males are when it comes to admission for that major)

PSS. I found the chapter on Simpson's paradox in Kadane's Principle of Uncertainty a very clear read on using weights and probabilities to explain the paradox.

Related Solutions

Confounding – Understanding Simpson’s Paradox and Confounding

Simpson's paradox is an extreme form of confounding where the apparent sign of correlation is reversed; you haven't said this is the position here.

I can see at least three possibilities here: the heterogenity between the subgroups, the reduction in sample sizes in each, and poor definition of the subgroups which presuppose the results. Ignoring the third, both of the first two can have an impact: from past experience it is often the small sample size which lead to non-significance in the smaller subgroup and heterogenity which causes the whole group to produce a significant result wile the large subgroup does not.

That was an over-generalisation - each case will have its own issues.

Simpson’s Paradox – Understanding the Basics of Simpson’s Paradox

I think A and E aren't a good combination, because A says you should pick Mercy and E says you should pick Hope.

A and D have the virtue of advocating the same choice. But, lets examine the line of reasoning in D in further detail, since that seems to be the confusion. The probability of success for the surgeries follows the same ordering at both hospitals, with the A type being most likely to be successful and the E type being the least likely. If we collapse over (i.e., ignore) the hospitals, we can see that the marginal probability of success for the surgeries is:

Type     A     B     C     D     E     All  
Prob   .81   .78   .56   .21   .08     .52

Because E is much less likely to be successful, it is reasonable to imagine that it is more difficult (although in the real world, other possibilities exist as well). We can extend that line of thinking to the other four types also. Now lets look at what proportion of each hospital's total surgeries are of each type:

Type     A     B     C     D     E  
Mercy  .08   .39   .06   .44   .03  
Hope   .09   .54   .23   .09   .05

What we notice here is that Hope tends to do more of the easier surgeries A-C (and especially B & C), and fewer of the harder surgeries like D. E is pretty uncommon in both hospitals, but, for what it's worth, Hope actually does a higher percentage. Nonetheless, the Simpson's Paradox effect is going to mostly be driven by B-D here (not actually column E as answer choice D suggested).

Simpson's Paradox occurs because the surgeries vary in difficulty (in general) and also because the N's differ. It is the differing base rates of the different types of surgeries that makes this counter-intuitive. What is happening would be easy to see if both hospitals did exactly the same number of each type of surgery. We can do that by simply calculating the success probabilities and multiplying by 100; this adjusts for the different frequencies:

Type     A     B     C     D     E     All  
Mercy   81    79    60    21    09     250  
Hope    80    76    51    14    04     225

Now, because both hospitals did 100 of each surgery (500 total), the answer is obvious: Mercy is the better hospital.

Best Answer

Related Solutions

Confounding – Understanding Simpson’s Paradox and Confounding

Simpson’s Paradox – Understanding the Basics of Simpson’s Paradox

Related Question