Solved – Interview question: If correlation doesn’t imply causation, how do you detect causation

causalitycorrelationself-study

I got this question:

If correlation doesn't imply causation, how do you detect causation?

in an interview.

My answer was: You do some form of A/B testing. The interviewer kept prodding me for another approach but I couldn't think of any, and he wouldn't tell me whether my initial response was correct or not.

Are there any other approaches? And was my response correct?

Best Answer

There are a few ways around this. You are right that A/B testing is one of these. The economics Nobel this year was awarded for the pioneering of field experiments in the study of policies against poverty which do exactly this.

Otherwise, you could go about one of the following alternatives:

  1. Selection on observables. Probably the most popular approach. You assume that conditional on some control variables, treatment assignment is random. In what is called the potential outcomes framework, under a binary treatment you could state this assumption as $Y_i(1), Y_i(0) \perp T_i \mid X_i$ where $T_i\in\{0,1\}$, $Y_i(t)$ are unit $i$’s outcome under treatment status $t$, and $X_i$ is a vector of $i$’s characteristics. The ideal way to achieve this is to randomize $T_i$. But other approaches that rely on this assumption are matching (including ML methods such as causal trees), inverse probability weighting, and the more ubiquitous method of adding $X_i$ as additional covariates in a linear regression. Computer science has gifted us with the theory of “directed acyclic graphs” for causal inference which help us think about what are good and what are bad variables to include in $X_i$.
  2. Regression discontinuity designs. This method is very popular because it offers credible interpretation of results as causal. To illustrate the idea, take the example of a spatial discontinuity. Suppose there was an earthquake and kids in a certain zone were mandated to not go to school for 3 months. Kids just outside the border had no disruption in going to school. So you can compare kids just inside the zone to those just outside, and plausibly the only thing that will be different between them is school attendance. You can then regress their subsequent years of schooling, college attendance, etc., on which side of the border they lived, and get the causal effects of school attendance. Note that how to choose the right window around the discontinuity and implement the RD estimator is a subtle question and there is a literature behind this (see @olooney’s comment to this answer).
  3. Instrumental variables. This is similar to regression discontinuity but usually much more difficult to defend. An instrument is a variable that you believe is only correlated with the outcome through the treatment status (that is, through the variable whose effect you want to measure). If this is the case, you can use something called two-stage least squares to estimate the causal effect. This genre has a small library’s worth of research on how things can go wrong if the assumptions fail, and even if they do not fail. But note that an RD can be a valid instrument. In the earthquake example, which side of the boundary someone lived on can be an instrument for school attendance because it is plausibly not correlated with anything else that explains the outcomes. Other clever strategies in this category are shift-share and Bartik instruments. These also have research exploring the assumptions they rely on.
  4. Difference-in-differences. This method relaxes the assumption of selection on observables. It moves to a before-after setting, and compares the average outcome change of those in the treatment group to the average outcome change of those in the control group. In doing so, the assumption that it makes is that of parallel trends: that the average change of the treatment group would’ve been the same as that of the control group had they not received the treatment. This method is incredibly popular because it’s more robust than selection on observables and settings where it can be credibly applied are more ubiquitous than for regression discontinuity or instrumental variables. A famous example is the minimum wage study of Card and Krueger who compared fast food restaurant workers in the Philadelphia area before and after a minimum wage change. A relatively recent variant of this method is that of synthetic controls which constructs an artificial control group and does diff-in-diff, which you may or may not like for its credibility.