I'm not totally sure of your question, but can remark on his claims and your confusion in the example model.
Andrew is not quite clear if scientific interest lies in the height adjusted sex-income association or the sex adjusted height-income association. In a causal model framework sex causes height but height does not cause sex. So if we want the impact of sex, adjusting for height would introduce mediator bias (possibly collider bias too, since rich people are taller!). I find it confusing and funny when I see applied research that interprets the other "covariates" (confounders and precision variables) which are included in a model. They are nonsense, but simply provide adequate stratification to make the comparison that is necessary. Adjusting for height, if you are interested in inference on sex based differences in income, is the wrong thing to do.
I agree counterfactuals are not necessary to explain Simpson's paradox. They can be simply a trait intrinsic to data. I think both crude and adjusted RRs are in some sense correct without being causal. It is more problematic, of course, when the objective is causal analysis, and overadjustment reveals problems of non-collapsibility (which inflates an OR) and insufficient sample size.
As a reminder for the readers: Simpson's paradox is a very specific phenomenon that refers to an instance in which an association flips direction after controlling for a confounding variable. The Berkeley Admissions data was the motivating example. There, crude RRs showed women were less likely to be accepted to Berkeley. However, once stratified by departments, the RRs showed that women were more likely to be accepted in every single department. They just were more likely to apply to the difficult departments that rejected many people.
Now in causal inference theory, we would be befuddled to conceive that the department one applied to causes gender. Gender is intrinsic right? Well, yes and no. Miettenen argues for a "study base" approach to such problems: who is the population? It is not all eligible students, it is the ones who specifically apply to Berkeley. The more competitive departments have attracted the women to apply to Berkeley when they would not have applied otherwise. To expand: a woman who is profoundly intelligent wants to get into the best, say, engineering program. If Berkeley had not had a great engineering program, she would not have applied to Berkeley anyway, she would have applied to MIT or CalPoly. So in that light, the "applying student" population, department causes gender and is a confounder. (caveat: I'm a first gen college student so don't know much about which programs are renowned for what).
So how do we summarize this data? It is true that Berkeley were more likely to admit a man who applied than a woman. And it is true that the departments of Berkeley were more likely to admit women than to admit men. Crude and stratified RRs are sensible measures even if they are non-causal. This underscores how important it is to be precise with our wording as statisticians (the humble author does not presume himself to be remotely precise).
Confounding is a phenomenon distinct from non-collapsibility, another form of omitted variable bias but one which is known to produce milder effects on estimates. Unlike logistic regression, non-collapsibilty does not cause bias in linear regression and the consideration of a continuous in Gelman's example should have been described more thoroughly.
Andrew's interpretation of the sex coefficient in his sex / height adjusted income model reveals the nature of the model's assumptions: the assumption of linearity. Indeed in the linear model, such comparisons between men and women are enabled because for a specific woman, we can predict what a similar height male may have earned, even if he wasn't observed. This is also the case if one allows for effect modification, so that the slope of the trend in women is different from than that of men. On the other hand, I don't think it's so crazy to conceive of men and women of the same height, 66 inches indeed would be a tall woman and short man. It seems a mild projection to me, rather than gross extrapolation. Furthermore, since the model assumptions can be stated clearly, it helps readers understand that the sex stratified income-height association bears information which is borrowed across or averaged between samples of males and females. If such an association were the object of inference, the earnest statistician would obviously consider the possibility of effect modification.
The formalism used to write models in R can be quite handy, in this case with factor variables explicitly noted:
Y ~ age + calendar + factor(teacher) + factor(gender) + factor(prep_course)
You could expand to indicate more specifically that this is a logistic regression, and I suppose to indicate the reference levels of the factor variables (although that probably isn't so important for your presentation).
Best Answer
When you have a regression model with one or more categorical variables, there is a level of each one of those variables that is taken as the reference level, and the model is adjusted taking into account these reference levels (for example, level "man" on your gender variable).
Then, you'll have to interpret it as follows: when gender is "man", the coefficient associated to "woman" won't have any effect on the response variable (you can think it as "woman" is 0). When gender is "woman", these variable is interpreted as 1, so the response variable will be affected by the asociated coefficient. So, if the "woman" coefficient is positive, this model is saying that womans have a higher incomes on average, and if it is negative, just the other way around.
The same happens with your education variable, but in this case, it has three levels. "no qualification" is the reference level, and you should use the coefficients of "higher-intermediate" or "graduate-or-more" only when you are trying to predict the response for people with these features.