I'm not totally sure of your question, but can remark on his claims and your confusion in the example model.
Andrew is not quite clear if scientific interest lies in the height adjusted sex-income association or the sex adjusted height-income association. In a causal model framework sex causes height but height does not cause sex. So if we want the impact of sex, adjusting for height would introduce mediator bias (possibly collider bias too, since rich people are taller!). I find it confusing and funny when I see applied research that interprets the other "covariates" (confounders and precision variables) which are included in a model. They are nonsense, but simply provide adequate stratification to make the comparison that is necessary. Adjusting for height, if you are interested in inference on sex based differences in income, is the wrong thing to do.
I agree counterfactuals are not necessary to explain Simpson's paradox. They can be simply a trait intrinsic to data. I think both crude and adjusted RRs are in some sense correct without being causal. It is more problematic, of course, when the objective is causal analysis, and overadjustment reveals problems of non-collapsibility (which inflates an OR) and insufficient sample size.
As a reminder for the readers: Simpson's paradox is a very specific phenomenon that refers to an instance in which an association flips direction after controlling for a confounding variable. The Berkeley Admissions data was the motivating example. There, crude RRs showed women were less likely to be accepted to Berkeley. However, once stratified by departments, the RRs showed that women were more likely to be accepted in every single department. They just were more likely to apply to the difficult departments that rejected many people.
Now in causal inference theory, we would be befuddled to conceive that the department one applied to causes gender. Gender is intrinsic right? Well, yes and no. Miettenen argues for a "study base" approach to such problems: who is the population? It is not all eligible students, it is the ones who specifically apply to Berkeley. The more competitive departments have attracted the women to apply to Berkeley when they would not have applied otherwise. To expand: a woman who is profoundly intelligent wants to get into the best, say, engineering program. If Berkeley had not had a great engineering program, she would not have applied to Berkeley anyway, she would have applied to MIT or CalPoly. So in that light, the "applying student" population, department causes gender and is a confounder. (caveat: I'm a first gen college student so don't know much about which programs are renowned for what).
So how do we summarize this data? It is true that Berkeley were more likely to admit a man who applied than a woman. And it is true that the departments of Berkeley were more likely to admit women than to admit men. Crude and stratified RRs are sensible measures even if they are non-causal. This underscores how important it is to be precise with our wording as statisticians (the humble author does not presume himself to be remotely precise).
Confounding is a phenomenon distinct from non-collapsibility, another form of omitted variable bias but one which is known to produce milder effects on estimates. Unlike logistic regression, non-collapsibilty does not cause bias in linear regression and the consideration of a continuous in Gelman's example should have been described more thoroughly.
Andrew's interpretation of the sex coefficient in his sex / height adjusted income model reveals the nature of the model's assumptions: the assumption of linearity. Indeed in the linear model, such comparisons between men and women are enabled because for a specific woman, we can predict what a similar height male may have earned, even if he wasn't observed. This is also the case if one allows for effect modification, so that the slope of the trend in women is different from than that of men. On the other hand, I don't think it's so crazy to conceive of men and women of the same height, 66 inches indeed would be a tall woman and short man. It seems a mild projection to me, rather than gross extrapolation. Furthermore, since the model assumptions can be stated clearly, it helps readers understand that the sex stratified income-height association bears information which is borrowed across or averaged between samples of males and females. If such an association were the object of inference, the earnest statistician would obviously consider the possibility of effect modification.
It's not clear whether you want estimates of height for each individual man and woman (more of a classification problem) or to characterize the distribution of heights of each sex. I will assume the latter. You also do not specify what additional information you are using in your model, so I will confine myself to addressing the case where you only have height data (and sex data, in the case of non-US citizens).
I recommend simply fitting a mixture of distributions to the height data from the US only, because the distributions of height in men and women are reasonably different. This would estimate the parameters of two distributions that when summed together best describe the variation in the data. The parameters of these distributions (mean and variance, since a Gaussian distribution should work fine) give you the information you are after. The R packages mixtools
and mixdist
let you do this; I'm sure there are many more as well.
This solution may seem odd, because it leaves out all the information you have from outside the US, where you have know the sex and height of each individual. But I think it is justified because:
1) We have a very strong prior expectation that men are on average taller than women. Wikipedia's List of average human height worldwide shows not even one country or region where women are taller than men. So the identity of the distribution with the greater mean height is not really in doubt.
2) Integrating more specific information from the non-US data will likely involve making the assumption that the covariance between sex and height is the same outside the US as inside. But this is not entirely true - the same Wikipedia list indicates that the ratio of male to female heights varies between approximately 1.04 and 1.13.
3) Your international data may be much more complicated to analyse because people in different countries have wide variation in height distributions as well. You may therefore need to consider modelling mixtures of mixtures of distributions. This may also be true in the US, but it is likely to be less of a problem than a dataset that includes the Dutch (mean height: 184 cms) and Indonesians (mean height: 158 cms). And those are country-level averages; subpopulations differ to an even degree.
Best Answer
It sounds like you want a fully interacted model.
In which case, you interact all the terms with the dummy variable:
$Y_i = \beta_0 + \beta_1 X_{1,i} + \beta_2 X_{2,i} + \beta_3 X_{3,i} + \beta_4 (X_{1,i}*X_{3,i}) + \beta_5 (X_{2,i}*X_{3,i}) + \epsilon_i$
$\beta_1$ and $\beta_2$ are the effects of $X_1$ and $X_2$ when $X_{3,i} = 0$
$\beta_1 + \beta_4$ is the effect of $X_1$ when $X_{3,i} = 1$
$\beta_2 + \beta_5$ is the effect of $X_2$ when $X_{3,i} = 1$