From the Wikipedia page titled correlation does not imply causality,
For any two correlated events, A and B, the different possible relationships include:
- A causes B (direct causation);
- B causes A (reverse causation);
- A and B are consequences of a common cause, but do not cause each
other; - A and B both causes C, which is (explicitly or implicitly)
conditioned on.; - A causes B and B causes A (bidirectional or cyclic causation);
- A causes C which causes B (indirect causation);
- There is no connection between A and B; the correlation is a
coincidence.
What does the fourth point mean. A and B both causes C, which is (explicitly or implicitly) conditioned on. If A and B cause C, why do A and B have to be correlated.
Best Answer
"Conditioning" is a word from probability theory : https://en.wikipedia.org/wiki/Conditional_probability
Conditioning on C means that we are only looking at cases where C is true. "Implicitly" means that we may not be making this restriction explicit, sometimes not even aware of doing it.
The point means that, when A and B both cause C, observing a correlation between A and B in cases where C is true, does not mean there is a real relationship between A and B. It's just conditioning on C (maybe unwillingly) that creates an artificial correlation.
Let's take an example.
In a country there exists exactly two sorts of diseases, perfectly independent. Call A : "person has first disease", B : "person has second disease". Assume $P(A)=0.1$, $P(B)=0.1$.
Now any person who has one of these diseases goes to see the doctor and only then. Call C : "person goes to see the doctor". We have $C=A \text{ or } B$.
Now let's calculate a few probabilities :
Clearly, when conditioned on C, $A$ and $B$ are very far from being independent. Actually, conditioned on C, $not A$ seems to "cause" $B$.
If you use the list of persons who where recorded by their doctor(s) as a data source for an analysis, then there seems to be a strong correlation between diseases $A$ and $B$. You may not be aware of the fact that your data source is actually a conditioning. This is also called a "selection bias".