I'm not totally sure of your question, but can remark on his claims and your confusion in the example model.
Andrew is not quite clear if scientific interest lies in the height adjusted sex-income association or the sex adjusted height-income association. In a causal model framework sex causes height but height does not cause sex. So if we want the impact of sex, adjusting for height would introduce mediator bias (possibly collider bias too, since rich people are taller!). I find it confusing and funny when I see applied research that interprets the other "covariates" (confounders and precision variables) which are included in a model. They are nonsense, but simply provide adequate stratification to make the comparison that is necessary. Adjusting for height, if you are interested in inference on sex based differences in income, is the wrong thing to do.
I agree counterfactuals are not necessary to explain Simpson's paradox. They can be simply a trait intrinsic to data. I think both crude and adjusted RRs are in some sense correct without being causal. It is more problematic, of course, when the objective is causal analysis, and overadjustment reveals problems of non-collapsibility (which inflates an OR) and insufficient sample size.
As a reminder for the readers: Simpson's paradox is a very specific phenomenon that refers to an instance in which an association flips direction after controlling for a confounding variable. The Berkeley Admissions data was the motivating example. There, crude RRs showed women were less likely to be accepted to Berkeley. However, once stratified by departments, the RRs showed that women were more likely to be accepted in every single department. They just were more likely to apply to the difficult departments that rejected many people.
Now in causal inference theory, we would be befuddled to conceive that the department one applied to causes gender. Gender is intrinsic right? Well, yes and no. Miettenen argues for a "study base" approach to such problems: who is the population? It is not all eligible students, it is the ones who specifically apply to Berkeley. The more competitive departments have attracted the women to apply to Berkeley when they would not have applied otherwise. To expand: a woman who is profoundly intelligent wants to get into the best, say, engineering program. If Berkeley had not had a great engineering program, she would not have applied to Berkeley anyway, she would have applied to MIT or CalPoly. So in that light, the "applying student" population, department causes gender and is a confounder. (caveat: I'm a first gen college student so don't know much about which programs are renowned for what).
So how do we summarize this data? It is true that Berkeley were more likely to admit a man who applied than a woman. And it is true that the departments of Berkeley were more likely to admit women than to admit men. Crude and stratified RRs are sensible measures even if they are non-causal. This underscores how important it is to be precise with our wording as statisticians (the humble author does not presume himself to be remotely precise).
Confounding is a phenomenon distinct from non-collapsibility, another form of omitted variable bias but one which is known to produce milder effects on estimates. Unlike logistic regression, non-collapsibilty does not cause bias in linear regression and the consideration of a continuous in Gelman's example should have been described more thoroughly.
Andrew's interpretation of the sex coefficient in his sex / height adjusted income model reveals the nature of the model's assumptions: the assumption of linearity. Indeed in the linear model, such comparisons between men and women are enabled because for a specific woman, we can predict what a similar height male may have earned, even if he wasn't observed. This is also the case if one allows for effect modification, so that the slope of the trend in women is different from than that of men. On the other hand, I don't think it's so crazy to conceive of men and women of the same height, 66 inches indeed would be a tall woman and short man. It seems a mild projection to me, rather than gross extrapolation. Furthermore, since the model assumptions can be stated clearly, it helps readers understand that the sex stratified income-height association bears information which is borrowed across or averaged between samples of males and females. If such an association were the object of inference, the earnest statistician would obviously consider the possibility of effect modification.
Thank you @AlexeyBurnakov for your answer; it really motivated me to find an explanation for the paradox using the concept of weights. After spending a few hours reading up on this (though with limited results as most of the material is beyond my level), my understanding of this is presented below:
For simplicity, I've modified the original problem to a simpler one with only 2 majors, but still retains the paradox: for each major (A & B), the admit rate for females are higher than that of males. However, overall, females are admitted at much lower rate (11%) than males (40%)!
| | Male | | | Female | | |
|---------|------------|--------|------------|-----------|--------|------------|
| | Applicants | Admits | Admit rate | Aplicants | Admits | Admit rate |
| Major A | 560 | 353 | 63% | 25 | 17 | 68% |
| Major B | 373 | 22 | 6% | 341 | 24 | 7% |
| Total | 933 | 375 | 40% | 366 | 41 | 11% |
Let's calculate these overall admit rates the "old" way:
Males:
$$
\dfrac{63\%\ *\ 560\ +\ 6\%\ *\ 373}{933} \ =\ 40\%\ \ \ \ \ \ \ (1)
$$
That's the same as:
$$
63\%\ *\ \dfrac{560}{933} \ +\ 6\%\ *\ \dfrac{373}{933} =40\% \ \ \ \ \ \ (2)
$$
, or
$$
63\%\ *\ 60\%\ +\ 6\%\ *\ 40\%\ =\ 40\% \ \ \ \ \ \ (3)
$$
In short, the "old" way of calculating the overall admission rate for male is:
$$
\sum P( males\ admitted\ for\ each\ major) \ *\ P( males\ applying\ to\ each\ major)
$$
Applying this formula to females will give the "old" way of calculating the overall admission rate for females:
$$
68\%\ *\ 7\%\ +\ 7\%\ *\ 93\%\ =\ 11\% \ \ \ \ \ \ (4)
$$
Comparing equation (3) and (4), we can clearly see the reason for the paradox are the "weights" associated with male and female admission rates for each major. More specifically, these "weights" are the probability (or rather propensity) for males and females applying to a certain major (60% major A and 40% major B for males, 7% major A and 93% major B for females).
In other words, the very high probability of females applying to the hard major B (93%), whose admission rate is only 7%, means the overall female admission rate will be "weighted down" towards that 7%, hence the overall rate of only 11%. This matches with the author's explanation in the book, while also uses the concept of "weights" to explain the paradox. The "old" overall averages are already weighted, but the weights are not quite fair as they differ between males and females.
So how can we make the weights fairer?
The author suggests using the same weights for both males and females. But what kind of same weights should be use? The author suggest using the (gender-agnostic) probability that an applicant applies to a certain major, instead of using different probabilities of males or females applying to that major separately (as seen in equations 3 & 4).
With these new weights, the overall male admission rate becomes (compare this with equation 2 to see the difference):
$$
63\%\ *\ \dfrac{560+25}{933\ +\ 366} +6\%\ *\ \dfrac{373\ +\ 341}{933\ +\ 366} \ =\ 31.6\% \ \ \ \ \ \ (5)
$$
, or
$$
63\%\ *\ 45\%+6\%\ *\ 55\%\ =\ 31.6\% \ \ \ \ \ \ (6)
$$
, while the "new" overall female admission rate becomes:
$$
68\%\ *\ 45\%+7\%\ *\ 55\%\ =\ 34.5\% \ \ \ \ \ \ (7)
$$
Therefore, with the new weights that are the the same across genders, one can see that females are in fact not underrepresented as one might think from the "old" overall admission rates.
However, I'm still not sure why the gender-agnostic application rates would provide "fairer" overall rates. Intuitively, it kind of makes sense to me: $\dfrac{560+25}{933\ +\ 366}$ is between $\dfrac{560}{933}$ (the high male application rate for the easy major A) and $\dfrac{25}{366}$ (the low female application rate for that major)
However, perhaps there are mathematical derivations for that. After all, any constant set of weights would lead to the more accurate results (that females are not underrepresented), so if anyone has further explanations on why we should use that set of weights -- and not any other -- I'd love to hear it!
PS. According to this paper by Westbrooke, it's better to forgo weighted averages, but instead use the actual and expected number of admitted females to represent the data (see table below). From this representation, we can see that the actual females admitted (41) is larger than the expected females admitted (36), hence reaching the same conclusion in Freedman that females are in fact not underrepresented in graduate admission.
| | Female | Male | Female | Actual | Expected |
| | admit rate | admit rate | applicants | females admitted* | females admitted** |
| Major A | 68% | 63% | 25 | 17 | 16 |
| Major B | 7% | 6% | 341 | 24 | 20 |
| Overall | | | 366 | 41 | 36 |
* using female admission rate for that major (e.g. 68% * 25)
** using male admission rate for that major (e.g. 63% * 25, to verify if females are as represented as males are when it comes to admission for that major)
PSS. I found the chapter on Simpson's paradox in Kadane's Principle of Uncertainty a very clear read on using weights and probabilities to explain the paradox.
Best Answer
Here is a general approach to understanding Simpson's Paradox algebraically for count data.
Suppose that we have survival data for an exposure and we create a 2x2 contingency table. To keep things simple we will have the same counts in each cell. We could relax this, but it would make the algebra quite messy.
\begin{array}{|c|c|c|c|} \hline & \text{Died} & \text{Survived} & \text{Death Rate} \\ \hline \text{Exposed} & X & X & 0.5 \\ \hline \text{Unexposed}& X & X & 0.5\\ \hline \end{array}
In this case, the Death Rate is the same in both the Exposed and Unexposed groups.
Now, if we split the data, say into one group for females and another group for males, we obtain 2 tables, with the following counts:
Males: \begin{array}{|c|c|c|c|} \hline & \text{Died} & \text{Survived} & \text{Death Rate} \\ \hline \text{Exposed} & Xa & Xb & \frac{a}{a+b} \\ \hline \text{Unexposed}& Xc & Xd & \frac{c}{c+d}\\ \hline \end{array}
and for females: \begin{array}{|c|c|c|c|} \hline & \text{Died} & \text{Survived} & \text{Death Rate} \\ \hline \text{Exposed} & X(a-1) & X(b-1) & \frac{a-1}{a+b-2} \\ \hline \text{Unexposed}& X(c-1) & X(d-1) & \frac{c-1}{c+d-2}\\ \hline \end{array}
where $a,b,c,d \in [0,1]$ are the proportions of each cell in the aggregated data table that are male.
Simpson's Paradox will occur when the death rates for exposed males is greater than the death rate for unexposed males AND the death rate for exposed females is greater than the death rate for unexposed females. Alternatively, it will also occur when the death rates for exposed males is less than the death rate for unexposed males AND the death rate for exposed females is less than the death rate for unexposed females. That is, when
$$\left(\frac{a}{a+b} < \frac{c}{c+d}\right) \text{ and } \left(\frac{a-1}{a+b-2} < \frac{c-1}{c+d-2}\right)$$
$$ \text{Or }$$
$$\left(\frac{a}{a+b} > \frac{c}{c+d}\right) \text{ and } \left(\frac{a-1}{a+b-2} > \frac{c-1}{c+d-2}\right)$$
As a concrete example, let $X=100$, and $a=0.5, b=0.8, c=0.9$. Then we will have Simpson's paradox when:
$$\left(\frac{0.5}{0.8+0.9} < \frac{0.9}{0.9+d}\right) \text{ and } \left(\frac{0.5-1}{0.5+0.8-2} < \frac{0.9-1}{0.9+d-2}\right)$$
$$ (-9 < d < 1.44) \text{ and } (0.96 < d < 1.1) $$
From which we conclude that d must lie in $(0.96,1]$
The 2nd set of inequalities gives:
$$\left(\frac{0.5}{0.8+0.9} > \frac{0.9}{0.9+d}\right) \text{ and } \left(\frac{0.5-1}{0.5+0.8-2} > \frac{0.9-1}{0.9+d-2}\right)$$
$$ (d < -0.9 \text{ or } d>1.44) \text{ and } (0.96 < d \text{ or } d > 1.44) $$
which has no solution for $d \in [0,1]$
So for the three values that we chose for $a,b,$ and $c$, to invoke Simpson's paradox, $d$ must be greater than 0.96. In the case where the value was $0.99$ then we would obtain a Death Rate for Males of
$$ 0.5/ (0.5+0.8) = 38 \text{% in the exposed group} $$ $$ 0.9/ (0.9+0.99) = 48 \text{% in the unexposed group} $$
and for Females:
$$ (0.5-1)/ (0.5+0.8-2) = 71 \text{% in the exposed group} $$ $$ (0.9-1)/ (0.9+0.99-2) = 91 \text{% in the unexposed group} $$
So, males have a higher death rate in the unexposed group than in the exposed group, and females also have a higher death rate in the unexposed group than the exposed group, yet the death rates in the aggregated data are the same for exposed and unexposed.