$(A_k)_{k = 1}^{n}$ independent iff $P\left(\bigcap_{k = 1}^{n} B_k\right)= \prod_{k = 1}^{n} P(B_k)$ for every choice of $B_k \in \{A_k, A_k^C\}$

independenceprobability theory

Definition (independence):
$(A_i)_{i \in I}$ is called stochastically independent if for all finite non-empty subsets $J \subset I$ we have
\begin{align}
\mathbb{P}\left(\bigcap_{j \in J} A_j\right)
= \prod_{j \in J} \mathbb{P}(A_j).
\end{align}

Theorem:
$(A_k)_{k = 1}^{n}$ is independent if and only if for each choice of $B_k \in \{A_k, A_k^C\}$, $k \in \{1, \ldots, n\}$ we have
\begin{equation*}
\mathbb{P}\left(\bigcap_{k = 1}^{n} B_k\right)
= \prod_{k = 1}^{n} \mathbb{P}(B_k).
\end{equation*}

Proof:
"$\impliedby$":
By adding the product formulas for $\{B_1, \ldots, B_n\}$ and $\{B_1^C, B_2, \ldots, B_n\}$ we obtain
\begin{equation} \tag{1}
\mathbb{P}\left(\bigcap_{k = 2}^{n} B_k\right)
= \prod_{k = 2}^{n} \mathbb{P}(B_k)
\end{equation}

Therefore one obtain the formula for intersections of $n – 1$ sets, then $n – 2$ and so on. $\square$

My Question:
I don't understand how (1) is obtained and the reasoning following it:
If I add
$$
\mathbb{P}\left(\bigcap_{k = 1}^{n} B_k\right)
= \prod_{k = 1}^{n} \mathbb{P}(B_k)
\qquad \text{and} \qquad
\mathbb{P}\left(\bigcap_{k = 2}^{n} B_k \cap B_1^C\right)
= \prod_{k = 2}^{n} \mathbb{P}(B_k) \cdot B_1^C
$$

I get
$$
\mathbb{P}\left(\bigcap_{k = 1}^{n} B_k\right) + \mathbb{P}\left(\bigcap_{k = 2}^{n} B_k \cap B_1^C\right)
= \prod_{k = 1}^{n} \mathbb{P}(B_k) + \prod_{k = 2}^{n} \mathbb{P}(B_k) \cdot B_1^C
= \prod_{k = 2}^{n} \mathbb{P}(B_k) \underbrace{\left[ \mathbb{P}(B_1) + \mathbb{P}(B_1^C) \right]}_{= 1}.
$$

But how does the LHS simplify?
I guess I have to use the definition but since I only know that the $(A_k)_{k = 1}^{n}$ are independent by hypothesis, what can I say about the independence of $(B_k^{(C)})_{k = 1}^{n}$?

Best Answer

The crucial observation I think is to note that

$\bigcap_{k=1}^n B_k$ and $B_1^\complement \cap \bigcap_{k=2}^n B_k$ are disjoint (a point cannot be both in $B_1$ and its complement $B_1^\complement$) and have $\bigcap_{k=2}^n B_k$ as their union:

$$\bigcap_{k=2}^n B_k = \bigcap_{k=2}^n B_k \cap \left(B_1 \cup B_1^\complement\right) = \left(\bigcap_{k=1}^n B_k\right) \cup \left(\bigcap_{k=2}^n B_k \cap B_1^\complement \right)$$

by the usual distributive law $A \cap (B \cup C) = (A \cap B ) \cup (A \cap C)$ etc. So the sum of probabilities of the two sets is that of the union because of the usual axioms for a probability.