Solved – The Book of Why by Judea Pearl: Why is he bashing statistics

causality

I am reading The Book of Why by Judea Pearl, and it is getting under my skin1. Specifically, it appears to me that he is unconditionally bashing "classical" statistics by putting up a straw man argument that statistics is never, ever able to investigate causal relations, that it never is interested in causal relations, and that statistics "became a model-blinded data-reduction enterprise". Statistics becomes an ugly s-word in his book.

For example:

Statisticians have been immensely confused about what variables should and should not be controlled for, so the default practice has been to control for everything one can measure. […] It is a convenient, simple procedure to follow, but it is both wasteful and ridden with errors. A key achievement of the Causal Revolution has been to bring an end to this confusion.

At the same time, statisticians greatly underrate controlling in the sense that they are loath to talk about causality at all […]

However, causal models have been in statistics like, forever. I mean, a regression model can be used essentially a causal model, since we are essentially assuming that one variable is the cause and another is the effect (hence correlation is different approach from regression modelling) and testing whether this causal relationship explains the observed patterns.

Another quote:

No wonder statisticians in particular found this puzzle [The Monty Hall problem] hard to comprehend. They are accustomed to, as R.A. Fisher (1922) put it, "the reduction of data" and ignoring the data-generating process.

This reminds me of the reply Andrew Gelman wrote to the famous xkcd cartoon on Bayesians and frequentists: "Still, I think the cartoon as a whole is unfair in that it compares a sensible Bayesian to a frequentist statistician who blindly follows the advice of shallow textbooks."

The amount of misrepresentation of s-word which, as I perceive it, exists in Judea Pearls book made me wonder whether causal inference (which hitherto I perceived as a useful and interesting way of organizing and testing a scientific hypothesis2) is questionable.

Questions: do you think that Judea Pearl is misrepresenting statistics, and if yes, why? Just to make causal inference sound bigger than it is? Do you think that causal inference is a Revolution with a big R which really changes all our thinking?

Edit:

The questions above are my main issue, but since they are, admittedly, opinionated, please answer these concrete questions (1) what is the meaning of the "Causation Revolution"? (2) how is it different from "orthodox" statistics?


1. Also because he is such a modest guy.
2. I mean in the scientific, not statistical sense.

EDIT: Andrew Gelman wrote this blog post on Judea Pearls book and I think he did a much better job explaining my problems with this book than I did. Here are two quotes:

On page 66 of the book, Pearl and Mackenzie write that statistics “became a model-blind data reduction enterprise.” Hey! What the hell are you talking about?? I’m a statistician, I’ve been doing statistics for 30 years, working in areas ranging from politics to toxicology. “Model-blind data reduction”? That’s just bullshit. We use models all the time.

And another one:

Look. I know about the pluralist’s dilemma. On one hand, Pearl believes that his methods are better than everything that came before. Fine. For him, and for many others, they are the best tools out there for studying causal inference. At the same time, as a pluralist, or a student of scientific history, we realize that there are many ways to bake a cake. It’s challenging to show respect to approaches that you don’t really work for you, and at some point the only way to do it is to step back and realize that real people use these methods to solve real problems. For example, I think making decisions using p-values is a terrible and logically incoherent idea that’s led to lots of scientific disasters; at the same time, many scientists do manage to use p-values as tools for learning. I recognize that. Similarly, I’d recommend that Pearl recognize that the apparatus of statistics, hierarchical regression modeling, interactions, poststratification, machine learning, etc etc., solves real problems in causal inference. Our methods, like Pearl’s, can also mess up—GIGO!—and maybe Pearl’s right that we’d all be better off to switch to his approach. But I don’t think it’s helping when he gives out inaccurate statements about what we do.

Best Answer

I fully agree that Pearl's tone is arrogant, and his characterisation of "statisticians" is simplistic and monolithic. Also, I don't find his writing particularly clear.

However, I think he has a point.

Causal reasoning was not part of my formal training (MSc): the closest I got to the topic was an elective course in experimental design, i.e. any causality claims required me to physically control the environment. Pearl's book Causality was my first exposure to a refutation of this idea. Obviously I can't speak for all statisticians and curricula, but from my own perspective I subscribe to Pearl's observation that causal reasoning is not a priority in statistics.

It is true that statisticians sometimes control for more variables than is strictly necessary, but this rarely leads to error (at least in my experience).

This is also a belief that I held after graduating with an MSc in statistics in 2010.

However, it is deeply incorrect. When you control for a common effect (called "collider" in the book), you can introduce selection bias. This realization was quite astonishing to me, and really convinced me of the usefulness of representing my causal hypotheses as graphs.

EDIT: I was asked to elaborate on selection bias. This topic is quite subtle, I highly recommend perusing the edX MOOC on Causal Diagrams, a very nice introduction to graphs which has a chapter dedicated to selection bias.

For a toy example, to paraphrase this paper cited in the book: Consider the variables A=attractiveness, B=beauty, C=competence. Suppose that B and C are causally unrelated in the general population (i.e., beauty does not cause competence, competence does not cause beauty, and beauty and competence do not share a common cause). Suppose also that any one of B or C is sufficient for being attractive, i.e. A is a collider. Conditioning on A creates a spurious association between B and C.

A more serious example is the "birth weight paradox", according to which a mother's smoking (S) during pregnancy seems to decrease the mortality (M) of the baby, if the baby is underweight (U). The proposed explanation is that birth defects (D) also cause low birth weight, and also contribute to mortality. The corresponding causal diagram is { S -> U, D -> U, U -> M, S -> M, D -> M } in which U is a collider; conditioning on it introduces the spurious association. The intuition behind this is that if the mother is a smoker, the low birth weight is less likely to be due to a defect.