[Math] Almost sure weak convergence of empirical measure

convergence-divergencemeasure-theoryprobabilityprobability theoryweak-convergence

Do empirical measures converge weakly to the measure almost surely? In particular suppose $\mu$ is a Borel probability measure on $\mathbb R^d$ and that $X_1,X_2,\dots$ are IID drawn from $\mu$. Let $\hat\mu_N=\frac{1}{N}\sum_{i=1}^N\delta_{X_i}$. The Strong Law of Large numbers Says that $\mathbb P(\hat\mu_N(A)\to\mu(A))=1$ for every $A$. Almost sure weak convergence, on the other hand, would be that $\mathbb P(\hat\mu_N(A)\to\mu(A)\text{ for every continuity set $A$})=1$. Is that statement true?

Note that this is very different from the stronger statement $\mathbb P(\sup_{\text{continuity set $A$}}\left|{\hat\mu_N(A)-\mu(A)}\right|\to0)=1$, which would require continuity sets to constitute a Glivenco-Cantelli class, something that only happens in $d=1$ as far as I understand.

Applying the Hewitt–Savage zero–one law
(https://en.wikipedia.org/wiki/Hewitt%E2%80%93Savage_zero-one_law) it's clear that $\mathbb P(\hat\mu_N(A)\to\mu(A)\text{ for every continuity set $A$})$ is either $1$ or $0$ and nowhere in between. Can we argue it cannot be $0$?

If this is not true in general are there reasonable sufficient conditions on $\mu$ that would guarantee almost sure weak convergence of $\hat\mu_N$? My feeling is that something like compactness of support should do it.

Thanks.

Best Answer

The result is true.

Theorem 11.4.1 in Real Analysis and Probability by R.M.Dudley explains why the empirical measures converge almost surely for a Borel probability measure $\mu$ on a separable metric space $(S,d)$. I can provide more detail if you can't find this reference.