Causality Theories – Which Theories of Causality Should be Known?

causalitymachine learningmathematical-statisticstreatment-effect

Which theoretical approaches to causality should I know as an applied statistician/econometrician?

I know the (a very little bit)

Which concepts do I miss or should I be aware of?

Related: Which theories are foundations for causality in machine learning?

I have read these interesting questions and the answers (1, 2, 3) but I think is a different question. And I was very surprised to see that "causality", for example, is not mentioned in Elements of Statistical Learning.

Best Answer

Strictly speaking, "Granger causality" is not at all about causality. It's about predictive ability/time precedence, you want to check whether one time series is useful to predict another time series---it's suited for claims like "usually A happens before B happens" or "knowing A helps me predict B will happen, but not the other way around" (even after considering all past information about $B$). The choice of this name was very unfortunate, and it's a cause of several misconceptions.

While it's almost uncontroversial that a cause has to precede its effect in time, to draw causal conclusions with time precedence you still need to claim the absence of confounding, among other sources of spurious associations.

Now regarding the Potential Outcomes (Neyman-Rubin) versus Causal Graphs/Structural Equation Modeling (Pearl), I would say this is a false dilemma and you should learn both.

First, it's important to notice that these are not opposite views about causality. As Pearl puts it, there's a hierarchy regarding (causal) inference tasks:

  1. Observational prediction
  2. Prediction under intervention
  3. Counterfactuals

For the first task, you only need to know the joint distribution of observed variables. For the second task, you need to know the joint distribution and the causal structure. For the last task, of counterfactuals, you will further need some information about the functional forms of your structural equation model.

So, when talking about counterfactuals, there's a formal equivalency between both perspectives. The difference is that potential outcomes take counterfactual statements as primitives and in DAGs counterfactuals are seen as derived from the structural equations. However, you might ask, if they are "equivalent", why bother learning both? Because there are differences in terms of "easiness" to express and derive things.

For example, try to express the concept of M-Bias using only potential outcomes --- I've never seen a good one. In fact, my experience so far is that researchers who never studied graphs aren't even aware of it. Also, casting the substantive assumptions of your model in graphical language will make it computationally easier to derive its empirical testable implications and answer questions of identifiability. On the other hand, sometimes people will find it easier to first think directly about the counterfactuals themselves, and combine this with parametric assumptions to answer very specific queries.

There's much more one could say, but the point here is that you should learn how to "speak both languages". For references, you can check out how to get started here.