Why are two disjoint events defined to be independent if one has zero probability

conditional probabilityindependenceprobabilityterminology

Let $A$ and $B$ be two disjoint events in a probability space and suppose that one of the two events has zero probability. According to the standard definition of independence, this means that $A$ and $B$ are independent. Unfortunately this definition seems very counter-intuitive to me:

If both events are non-empty, then I would instead define them to be dependent, as the occurrence of one event excludes the occurrence of the other. Why has the math community accepted a different definition? Just out of convenience? What would be the consequences if we changed the definition?

Best Answer

Why do we define independence in a way that allows for an event of probability zero to be independent of another event?

The case you are referring to can be seen as degenerate. If one event has probability $0$ then there isn't a clear way to interpret the notion of independence intuitively (in order to know if an event occurring has some impact on another event, we first need that event to be possible). The reason that we define independence in the way that we do is simply out of convenience (as you speculate in the question). It is convenient to have one notion of independence so that we can apply lots of results that apply to independent events in as broad a way as possible. If we were to exclude the cases that you refer to, then this would just create additional work to prove that the standard results that follow for independent events also work in the case that you have excluded.

What would the consequences be if we modified the definition?

If we were to change the definition now, then there would be some real consequences. For example, in the standard proof of Kolmogorov's Zero-One Law, we use the fact that if the event $A$ is independent of itself then the probability of $A$ must be $0$ or $1$ (in other words: $P(A \cap A) = P(A)P(A)$ if and only if $P(A) = 0$ or $P(A)=1$) - this proof can be found in the book "Measure Theory" by Donald Cohn.

If we were to modify the definition of independence in the way that you suggest, then this proof breaks down as the event $A$ can no longer said to be independent. It is, of course, still possible to modify this proof, but it becomes unnecessarily complicated because we can no longer refer to the notion of $A$ being independent and we can no longer apply any of the standard results that follow from independence either (without additional justification).

This is, of course, one example, but there are many other proofs that use similar ideas to the one above and so the definition of independence that we currently have lends itself nicely to simplifying these types of proofs. The only downside is (as you also pointed out), that you lose some intuition when you think about these type of events logically. However, as examples involving events of this nature are degenerate, this isn't a big concern in the mathematical community.

Why do some definitions exclude edge cases?

There are, of course, some definitions that do exclude edge cases. However, there are usually much more serious reasons for this.

For example, one could ask why $1$ is not a prime number. We could easily have allowed $1$ to be a prime and number, and to some, this might be more intuitive (like in this case).

However, if we did modify the definition of prime numbers to include the number $1$, then we would run into problems. Numbers would no longer have unique prime factorisations

$$6 = \color{red}{3 \times2} \space = \space \color{blue}{3 \times 2 \times 1}\space = \space \color{green}{3 \times 2 \times 1 \times 1} = \space \space ... $$

This would create a lot more work for mathematicians and would also violate the Fundamental Theorem of Arithmetic (requiring it to be rewritten).

Therefore, in some cases, like this one, it is sensible to exclude an "edge case". However, in the case you describe, there are no serious ramifications of including $0$. Therefore, from the perspective of convenience it makes sense to include the degenerate cases to simplify our analyses and proofs.