Almost sure convergence of $L_1$-Wasserstein distance between the empirical and true CDFs

cumulative-distribution-functionsmeasure-theoryprobability distributionsprobability theoryrandom variables

Let $X_1, X_2, \ldots$ be i.i.d. from a distribution with cdf $F$. Let $F_n$ denote the empirical cdf $F_n(t) = \frac{1}{n}\sum\limits_{i=1}^n I(X_i \leq t)$.

How can we prove that $\int_{\mathbb{R}}|F(t) – F_n(t)|\text{ d}t \overset{a.s.}{\longrightarrow} 0$ if $\operatorname{E}[X_1] < \infty$?


Barrio, Giné, and Matrán (1999) claims that this should follow from Glivenko-Cantelli, Law of Large Numbers, and Dominated Convergence Theorem.

Matsak (2006) claims that this should follow from the Law of Large Numbers on Banach spaces.

Despite these references, I was unable to fill in the details to prove the claim.

Best Answer

By the strong law of large numbers we have $$F_n(t)=\frac{1}{n}\sum_{i=1}^n I(X_i\leq t) \rightarrow \mathbb E[I(X_1\leq t)] = \mathbb P[X_1\leq t] = F(t)$$ almost surely.

Now we have $$|F_n(t)-F(t)| =|(1-F(t))-(1-F_n(t))| \leq 1-F_n(t) + 1-F(t) $$ as well as $$|F_n(t)-F(t)|\leq F_n(t)+F(t)$$ Think about how these two inequalities give you a finitely integrable upper bound of $|F_n(t)-F(t)|$ for all $t\in\mathbb R$. Then you can apply the dominated convergence theorem.