Did Kolmogorov’s probability “experiments” survive in the modern theory

conditional probabilityindependenceprobabilityprobability theory

In Kolmogorov's probability book, he defines the independence of multiple "experiments" before using that to define the independence of events. Here is an excerpt from Section 5, Chapter 1 (note that Kolmogorov uses $E$ to refer to the sample space, while I will use $\Omega$ in what follows):

Let us turn to the definition of independence. Given $n$ experiments $\mathfrak{A}^{(1)}, \mathfrak{A}^{(2)}, \ldots, \mathfrak{A}^{(n)}$, that is, $n$ decompositions

$$E = A_1^{(i)} + A_2^{(i)} + \cdots + A_{r_i}^{(i)} \quad \quad i = 1,2, \ldots, n$$

of the basic set $E$. It is then possible to assign $r = r_1 r_2 \ldots r_n$ probabilities
(in the general case)

$$p_{q_1 q_2 \ldots q_n} = \mathsf{P} \left ( A_{q_1}^{(1)} A_{q_2}^{(2)} \ldots A_{q_n}^{(n)} \right ) \geq 0$$

which are entirely arbitrary except for the single condition that

$$\sum_{q_1, q_2, \ldots, q_n} p_{q_1 q_2 \ldots q_n} = 1$$

DEFINITION I. $n$ experiments $\mathfrak{A}^{(1)}, \mathfrak{A}^{(2)}, \ldots, \mathfrak{A}^{(n)}$ are
called mutually independent, if for any $q_1, q_2, \ldots, q_n$ the following equation holds true:

$$\mathsf{P} \left ( A_{q_1}^{(1)} A_{q_2}^{(2)} \ldots A_{q_n}^{(n)} \right ) = \mathsf{P} \left ( A_{q_1}^{(1)} \right ) \mathsf{P} \left ( A_{q_2}^{(2)} \right ) \ldots \mathsf{P} \left ( A_{q_n}^{(n)} \right )$$

In modern language, an "experiment" is represented by the sample space, $\Omega$, of a probability space $\left(\Omega, \mathfrak{F}, P\right)$, with the elements of $\Omega$ being the elementary outcomes of that "experiment," and elements of $\mathfrak{F}$ being all possible events associated to the "experiment."

But here Kolmogorov is describing something else: an experiment is a partition of the sample space into a set of disjoint events. In my understanding, this could be thought of as a specific question about the system, with the disjoint events being the set of all possible answers to that question. For example, when flipping two coins, you could ask (i) "Was the first coin heads?" with experiment $\left\{A, A^c \right\}$, where $A$ is the event that the first coin was heads, or (ii) "How many heads appeared?", where the experiment would be $\left\{A_0, A_1, A_2 \right\}$ with $A_i$ being the event of seeing $i$ heads.

To define the independence of events, Kolmogorov states that two events, $A$ and $B$, are independent if their associated experiments, $\left\{A, A^c \right\}$ and $\left\{B, B^c \right\}$, are independent. The (four) resulting equations from the final equation in the above excerpt reduce to a single independent equation, which can be taken to be $P\left(A\cap B\right) = P(A)P(B)$.

He then defines the conditional probability of an event, B, with respect to an experiment, $\mathfrak{A}=\left\{A_1, A_2, \ldots \right\}$, as a random variable that takes the (already defined, standard) value $P(B | A_i)$ when acting on the element of the partition of $\Omega$ defined by $\mathfrak{A}$. This definition generalizes nicely in later sections (Chapter 5) to define conditional probability with respect to a random variable, which can be thought of as itself defining a partition of $\Omega$, and thus an "experiment."

My question is: what is the purpose of these definitions (Kolmogorov's "experiments", and the associated definitions of independence and conditional probabilities) in probability theory? I ask because I cannot find very much discussion of them in probability textbooks or on the internet. It seems that either they have been discarded as not vital to the theory, or have been replaced and renamed, so that I cannot find them.

Best Answer

It seems that the "experiments" of Kolmogorov have been replaced by a more general notion of "independence between classes." Let $(\Omega, \mathscr{F}, \mathbf{P})$ be a probability space and let $\mathscr{C}_\alpha \subset \mathscr{F}$ be "classes of events." We say these clases are mutually independent if any choice of events $\mathrm{C}_\alpha \in \mathscr{C}_\alpha$ results in the events $\mathrm{C}_\alpha$ to be independent in the usual sense (that means that for any finite choice of different indices $\alpha_i,$ $1 \leq i \leq p,$ we have $\mathbf{P}\left(\bigcap\limits_{i = 1}^p \mathrm{C}_{\alpha_i} \right) = \prod\limits_{i = 1}^p \mathbf{P}(\mathrm{C}_{\alpha_i})$). A well-known result (do it, is not too hard!) shows that we can enlarge each $\mathscr{C}_\alpha$ by adding:

  1. Proper differences in $\mathscr{C}_\alpha$ (i.e. add $\mathrm{A} - \mathrm{B}$ where $\mathrm{B} \subset \mathrm{A}$ both belong to $\mathscr{C}_\alpha$).
  2. The sets $\varnothing, \Omega.$
  3. Countable disjoint unions of sets in $\mathscr{C}_\alpha.$
  4. Limits of monotone sequences in $\mathscr{C}_\alpha$ (i.e. add $\lim\limits_{n \to\infty} \mathrm{A}_n$ where each $\mathrm{A}_n \in \mathscr{C}_\alpha$ and either $\mathrm{A}_n \subset \mathrm{A}_{n+1}$ for all $n$ or $\mathrm{A}_{n+1} \subset \mathrm{A}_n$ for all $n$ ).
  5. Intersections cannot be added (look for a simple counter-example). However, if we further assume that the classes are $\pi$-systems (each one is closed under intersections; of course, you can enlarge your classess using 1-4 to get a $\pi$-system if possible and if the original classes really aren't $\pi$-systems by say, lacking $\varnothing$ in them and having two disjoint events in them), then intersections can be added and therefore (by monotone class theorem), the sigma-fields $\mathscr{F}_\alpha = \sigma(\mathscr{C}_\alpha)$ are independent!

Kolmogorov is using $\mathscr{C}_j = \mathfrak{A}^{(j)}$ ($1 \leq j \leq n$) as finite partitions of $\Omega$ and all 5 aformentioned enlargements apply.

You also asked about what is the importance of independence and conditional probability. To begin with, independence is the cornerstone of probability theory and it is independence that distinguishes probability theory from measure theory. Independence is intimately linked to conditional probability as the definition of independence is the same as saying that conditioning has no probabilistic effect (i.e. the measure remains the same, even when we are taking a subset of the space). For instance, if $E \subset \Omega$ is an event, then we can induce a canonical probability measure on $E$ by stating $P|_E(A) = c P(A)$ for $A \subset E$ measurable and where $c$ is a proportionality constant guaranteeing that we have a real probability measure on $E.$ Clearly, $c = 1/P(E).$ The effect of this canonical construction is that we have changed the underlying measurable space from $(\Omega, \mathscr{F})$ to $(E, \mathscr{F}_E := \{A \cap E \mid A \in \mathscr{F}\}).$ This is okay but an analytically better approach is to fix the measurable space $(\Omega, \mathscr{F})$ and then define the induced measure conditional on the occurence of $E$ as $P(\cdot \mid E) = P(A \cap E) / P(E)$ for $A \in \mathscr{F}.$ In this way, we can consider many induced measures simultaneously and everything is good as each of them will be defined on the same measurable space. Anyway, independence connects with conditional probability in that $P(A \mid E) = P(A)$ is equivalent to independence providede $P(E) > 0$ (if $P(E) = 0,$ $E$ is already independent of every event). Unless you study probability theory, the effects of independence are hard to grasp. Perhaps the best known set of examples are those of Markov processes (these are the probabilistic analog of classical mechanics where knowing the state of the process at some time allows to drop or ignore the past as it is not useful to predicting the future) or of Renewal Theory (processes such that the occurrence of an event makes the process start afresh, the classical example is random walk returning to zero). An important aspect of independence is that the process start afresh in a probabilistic sense, not in a $\omega$-by-$\omega$ sense.

I would like to add that sometimes statisticians and physicists miscontrue the definition of conditional probabilities in a rather fundamental way. (The Bayesian statisticians are particularly guilty here.) They think of probability as a measure of belif in something (whatever this means) and of conditioning is making a hypothesis. In this sense, conditional probability is (mis)construe as "updating one's beliefs in the occurrence of an experiment knowing a hypothesis is true." The problem with this interpretation is that probability theorem is based on measure theory and the theorems of probability are those of frequency over a long run of independent repetitions. In other words, probability measure really are modelling the notion that if we repeat an experiment $n$ times, then a fraction $p$ of those time the experiment will be success (e.g. The Law of Large Number or the Ergodic Type Theorems). The statistical thinking of belief is undefined as far as I am concerned and if it is defined but is not fully equivalent to this frequentist notion, then they are using the wrong mathematical tool. (I have been unfortunate enough to read PhD thesis of Statisticians or Physicists who go on with deep philosphical debates over whether or not Kolmogorov axioms "are correct." Of course they are, mathematics is concerned only with correctness within itself. Whether or not the person who wants to apply mathematical tools to real-world problems is miscontruing definitions or interpretation is of no mathematical concern.) Anyway, this is already too long...

Related Question