Probability Theory – A Short Proof of de Finetti’s Theorem

probabilityprobability theoryweak-convergence

Consider de Finetti's theorem in the following form.

Theorem. Let $E$ be a Polish space and $(X_1,X_2,\ldots)$ an exchangable sequence of $E$-valued random variables. Then there exists a (necessarily unique) random probability measure $\mu$ on $E$ such that, for every $n\in \mathbb{N}$, and $A_1,\ldots,A_n \subset E$ measurable,
$$\label{eq1}\tag{1}
\mathbb{P}(X_1\in A_1,\ldots , X_n \in A_n) = \mathbb{E}\left[\mu(A_1) \ldots \mu(A_n)\right].
$$
Furthermore, the weak limit of the empirical measures
$$
Z^n = \frac 1n \sum_{i=1}^n \delta_{X_i}
$$

as $n\to \infty$ exists almost-surely and is equal in law to $\mu$.

I wasn't aware of the theorem in this form, and "accidentally" reproved it. The proof is quite short and only builds on standard facts from the theory of weak convergence, see the proof sketch below.

Question: Is a proof of this length and method novel or interesting?

The proof does not feel extremely innovative so I would assume it to be standard, but I am struggling to find out. I found some proofs in the literature that are much more complicated but also more general (like in the paper of Hewitt and Savage), and in standard textbooks like that of Kallenberg the proof is short but builds on so many lemmas developed in the book that I find it hard to say what the main argument is and how complicated it is.


Proof sketch: Denote by $\mathcal{M}_1(X)$ for a Polish space $X$ the space of probability measures on $X$, equipped with the topology of weak convergence. To show that the sequence of empirical measures $(Z^n)$ is relatively compact as a subset of $\mathcal{M}_1(\mathcal{M}_1(E))$ it is sufficient to find for every $\varepsilon > 0$ a compact set $K\subset E$ such that $\mathbb{P}(Z^n(E\setminus K) > \varepsilon) < \varepsilon$ for all $n\in \mathbb{N}$. Given $\varepsilon > 0$ there exists $K\subset E$ compact such that $\mathbb{P}(X_1 \in E\setminus K) < \varepsilon^2$, so
$$
\mathbb{P}(Z^n(E\setminus K) > \varepsilon) = \mathbb{P}\left( \sum_{i=1}^n \mathbf{1}_{\{X_i \in E\setminus K\}} > n\varepsilon \right) \le \frac{1}{n\varepsilon} \sum_{i=1}^n \mathbb{P}(X_i \in E\setminus K) < \varepsilon
$$

for every $n\in \mathbb{N}$, where we used that all $X_i$ have the same marginal distribution by exchangability.
Now suppose that a subsequence (which we denote by $Z^n$ again for ease of notation) converges in law to a random probability measure, $Z^n \implies \mu$. Let $k\in \mathbb{N}$, and $f_1,\ldots f_k \colon E \to \mathbb{R}$ be continuous and bounded. Then $m \mapsto \int f_1 dm \ldots \int f_k dm$ is a continuous bounded functional on $\mathcal{M}_1(E)$, so
\begin{align}
\mathbb{E}\left[\int f_1 d\mu \ldots \int f_k d\mu\right]
&= \lim_{n\to \infty} \left[\int f_1 dZ^n \ldots \int f_k dZ^n\right]\\
&=\lim_{n\to \infty} \frac{1}{n^k}\sum_{l_1,\ldots l_k=1}^n \mathbb{E}\left[ f_1(X_{l_1}) \ldots f_k(X_{l_k})\right]\\
&= \mathbb{E}\left[ f_1(X_1) \ldots f_k(X_k)\right],
\end{align}

where the last equation uses exchangability and the fact that the number of summands where some number of indices collide is $O(n^{k-1})$. This implies uniqueness of the subsequential limit and thus convergence of $Z^n$ in law to a random probability measure $\mu$ which satisfies \eqref{eq1}. Almost sure convergence of the $Z^n$ then follows from the general fact that the sequence of normalised empirical measures of a sequence of i.i.d. samples converges almost-surely to the probability measure from which they are sampled.


Hewitt, Edwin; Savage, Leonard J., Symmetric measures on Cartesian products, Trans. Am. Math. Soc. 80, 470-501 (1955). ZBL0066.29604.

Best Answer

For closure: a proof with the arguments that @PeterKoepernik is interested in and which outlines the proof sketch above is present in the literature. It can be found in Klenke's Probability Theory (IInd ed.) Chapter 13.4. p. 269 and the author attributes the idea to Götz Kersting.