Solved – the difference between correlation, causation and prediction

causalitycorrelationpredictive-models

Suppose we have a set of events $\Omega$, containing events $A$ and $B$. My econometrics professor tried to distinguish the following three terms today.

  • Causation — $A$ causes $B$ if the occurrence of $A$ always leads to another specific outcome $B$. For example, clapping my hands causes a sound to be emitted.
  • Prediction — $A$ predicts $B$ if on average, $B$ is the expected outcome from $A$ occurring. In other words, whereas causality is deterministic, prediction is probabilistic. For example, studying for a test would predict doing well on the exam. But there's no guarantee.
  • Correlation — $A$ and $B$ are correlated if when one occurs the other does too. But there may be some hidden variable causing both.

Are these distinctions meaningful? Or did he make these terms up?

My Questions

  1. What is the difference between causation and correlation?
  2. Are "causes" always deterministic?

Best Answer

The distinction is meaningful. Unfortunately, your professor's definitions are lacking. It's not clear what "leads to" means. Those two words could be cashed out in terms of counterfactuals or interventions, but without further explanation, he's just saying that A causes B if A always causes B. That is at best unhelpful.

Worse, the "always" part of the definition is false. Causation is not always deterministic. We believe that smoking causes lung cancer, but there are many people who smoke but do not get lung cancer.

The most successful attempts to define causation do so in terms of interventions. Judea Pearl defines a causal effect like so*: $A$ causally influences $B$ if there are two states of $A$, $a$ and $a'$, such that $P(B|do(a)) \neq P(B|do(a'))$, where $do(a)$ represents an intervention that sets the value of $A$ to $A=a$. In plain English: $A$ influences $B$ if, when you intervene to change the value of $A$ from $a$ to $a'$, you change the probability distribution over the values of $B$.

*This is from memory, so might not match his notation precisely.

There are several differences between causation and correlation. For example: correlation is symmetric, whereas causation is asymmetric. Average air temperature and altitude are correlated: holding latitude constant, if I know the temperature I can predict the altitude (with some error), and vice versa. But while altitude causally influences temperature, temperature does not influence altitude. You cannot make a mountain taller by cooling it down!