[Math] Proof of Glivenko-Cantelli Theorem for non-continuity points

convergence-divergencelimitsprobability distributionsproof-verificationsequences-and-series

I have a question on the proof of the Glivenko-Cantelli Theorem. Consider the i.i.d. random variables $(X_1,\ldots,X_n)$ defined on the probability space $(\Omega, \mathcal{F}, \mathbb{P})$ (they can be discrete or continuous) with cdf $F(t):=\mathbb{P}(X \leq t)$ $\forall t \in \mathbb{R}$. Let $F_n(t):=\frac{1}{n}\sum_{i=1}^n1(X_i\leq t)$ where $1(\cdot)$ is the indicator function taking value $1$ if the condition inside is satisfied and $0$ otherwise.

Theorem: $ \sup_{t \in \mathbb{R}}|F_n(t)-F(t)| \rightarrow_{a.s.}0$ where $\rightarrow_{a.s.}$ denotes almost sure convergence.

Proof (sketch)

1) Fix $\varepsilon>0$

1) Show that we can construct a finite partition of $\mathbb{R}$ such that $-\infty=t_0<t_1<\cdots<t_k=\infty$ and $\lim_{t\rightarrow t_j^-}F(t)-F(t_{j-1})<\varepsilon$ for $j=1,\ldots,k$

2) Show that $\forall t_{j-1}\leq t\leq t_j$ and $\forall j$
$$
F_n(t)-F(t)\leq \lim_{t\rightarrow t_j^-}F_n(t)-\lim_{t\rightarrow t_j^-}F(t)+\varepsilon
$$
$$
F_n(t)-F(t)\geq F_n(t_{j-1})-F(t_{j-1})-\varepsilon
$$

(3) Show that $\sup_{j\in \{0,…,k\}}|F_n(t_j)-F(t_j)|\rightarrow_{a.s.}0$, i.e wp1 (with prob 1) $\forall \varepsilon>0$ $\exists n_\varepsilon \in \mathbb{N}$ such that $|F_n(t_j)-F(t_j)|\leq \varepsilon$ $\forall n \geq n_\varepsilon$ $\forall j$

(4) $\forall t_{j-1}\leq t\leq t_j$ and $\forall j$
$$
F_n(t)-F(t)\leq \lim_{t\rightarrow t_j^-}F_n(t)-\lim_{t\rightarrow t_j^-}F(t)+\varepsilon\overbrace{\underbrace{\leq}_{\text{ wp1, } n\geq n_{\varepsilon}}}^{\star} 2\varepsilon
$$

$$
F_n(t)-F(t)\leq F_n(t_{j-1})-F(t_{j-1})-\varepsilon\underbrace{\leq}_{\text{ wp1, } n\geq n_{\varepsilon}} -2\varepsilon
$$

(5) Hence, $\forall \varepsilon>0$, $\forall t$, $|F_n(t)-F(t)|\leq 2\varepsilon$ $\forall n \geq n_\varepsilon$ wp$1$. Done!

Question: How can I do step $(\star)$ when $t_j$ is not a continuity point of $F(\cdot)$?

Best Answer

Clarification of definitions: define $F(t^- )=P(X\lt t)$ and $F_n (t^- )=\frac{1}{n}\sum^n_{i=1} 1(X_i\lt t)$. This is consistent with the source I have linked to below. Given these definitions it follows that $F_n ({t_j}^- )=\lim_{t\rightarrow t_j^{-}} F_n (t)$, where the limit $t_j$ is approached from the left, since $X_i\lt t_j$ iff $\exists t\lt t_j$ such that $X_i\leq t $ i.e. the sets $(X_i\leq t)\uparrow (X_i\lt t_j)$ as $t\uparrow t_j$. Also, $F({t_j}^- )=\lim_{t\rightarrow t_j^{-}} F(t)$. This is true because, once again, the sets $(X\leq t)\uparrow (X\lt t_j)$ as $t\uparrow t_j$, and you can then use continuity of the probability measure.

First, a comment about step (1) where you construct a partition of $\mathbb{R}$. You need to allow for the possibility that sometimes we might have $t_j=t_{j-1}$ e.g. for discrete distributions with point masses.

Next, in step (1) you fixed $\epsilon$ but in (3) you allow $\epsilon$ to vary. It should be possible to rewrite (3) so that this confusion can be avoided.

Now, on to the question about step $(\star)$. To see why a step of this kind is possible, it helps to look more closely at how step (3) was achieved. Step (3) is possible because the Strong Law of Large Numbers can be applied to each $t_j$ individually, and the claim in step (3) then follows because there are only a finite number of $t_j$. More precisely, we apply the Strong Law to the sequence of random variables $1(X_n\le t_j)$ to obtain $F_n(t_j)\to F(t_j)$ a.s. However, we can also apply the Strong Law to the random variables $1(X_n\lt t_j)$, which leads to $F_n({t_j}^-)\to F({t_j}^-)$ a.s. So the conclusion of step (3) can be strengthened by adding additional terms of the form $|F_n({t_j}^-)-F({t_j}^-)|$ to the 'sup'. If this is done then the proof in step (4) should be straightforward.

There is a nice presentation of the proof in this handout.

Related Question