[Math] Question about weak convergence of random variables

general-topologyprobability distributionsprobability theoryweak-convergence

When you start to learn probability theory, for instance the central limit theorem, you learn about convergence in distribution $X_n\to X$ (where, say, both $X_n$ and $X$ are $\mathbb R$-valued random variables).

At the beginning, the first course start with the pointwise convergence of the repartition functions $F_n(x)\to F(x)$, where $F_n(x)=\mathbb P(X_n\leq x)$ and $F(x)=\mathbb P(X\leq x)$, but is not clear what it exactly implies (at least for me).

Then you learn that this equivalently means the law $\mu_n$ of $X_n$ converges to the law $\mu$ of $X$ in the weak topology. By weak topology, it means weak-* topology when you embed the space of (Borel) probability measures on $\mathbb R$ into the Banach space of continuous linear forms over the space of continuous and bounded functions. Sometimes you look at the dual space of continuous and compactly supported functions, and then you speak about vague convergence. Now, I take for definition for the convergence in law that $\int f d\mu_n\to\int fd\mu$ for the class of continuous and bounded functions.

But, what does it implies for larger spaces of test functions $f$?

When $f$ is continuous but not bounded, it is not enough: you need uniform integrability, fine.

But on the other way around, what if $f$ is bounded but not continuous (but still measurable) ? (if you even assume $X_n$ and $X$ take values on a same compact subset of $\mathbb R$ ?). In other words, is $f$ is bounded, what are the minimal regularity assumptions we can make on $f$ so that we still have $\int fd\mu_n\to\int fd\mu$ ?

Moreover : what kind of minimal set of test functions $f$ so that $\int fd\mu_n\to\int fd\mu$ implies convergence in distribution ? For instance, what if $f$ is in the space of Lipschitz functions ?

Best Answer

For your first question: you cannot use any non-continuous $f$.

Suppose $f : \mathbb{R} \to \mathbb{R}$ is a measurable function which is not continuous. There exist measures $\mu_n, \mu$ such that $\mu_n \to \mu$ weakly but $\int f\,d\mu_n \not\to \int f\,d\mu$.

Proof. If $f$ is not continuous then there exist points $x_n, x \in \mathbb{R}$ with $x_n \to x$ but $f(x_n) \not\to f(x)$. Let $\mu_n = \delta_{x_n}, \mu = \delta_x$ be point masses at $x_n, x$ respectively. (That is, take $X_n = x_n, X = x$ to be constant random variables.)


For your second question: we want to know conditions on a set $\mathcal{C}$ of bounded continuous functions on $\mathbb{R}$ to guarantee the following statement: if $\int f\,d\mu_n \to \int f\,d\mu$ for all $f \in \mathcal{C}$, then $\mu_n \to \mu$ weakly.

A helpful sufficient condition is: the uniform closure of the linear span of $\mathcal{C}$ contains the set $C_c(\mathbb{R})$ of continuous compactly supported functions.

First note that if $\int f\,d\mu_n \to \int f\,d\mu$ for all $f \in \mathbb{C}$, then by linearity of the integral, the same holds for $f$ which are (finite) linear combinations of functions from $\mathcal{C}$. Also, if $f_m \to f$ uniformly and the above condition holds for all $f_m$, then a triangle inequality argument shows it also holds for $f$. So the condition holds on the closure of the span of $\mathcal{C}$.

Now if the condition holds on $C_c(\mathbb{R})$, then we get $\mu_n \to \mu$ weakly. This is a standard, but not quite trivial, fact. The proof goes something like this.

Find a large enough compact set $K$ that $\mu(K) \ge 1-\epsilon$. Choose $f \in C_c(\mathbb{R})$ that is 1 on $K$ and bounded by 1. From the fact $\int f\,d\mu_n \to \int f\,d\mu$, deduce that the sequence $\{\mu_n\}$ is tight. By Prohorov's theorem, $\{\mu_n\}$ is weakly precompact. Using the "double subsequence" trick, it now suffices to show that the only possible subsequential weak limit is $\mu$.

Suppose $\nu$ is a weak limit of some subsequence $\mu_{n_k}$. For every $f \in C_c(\mathbb{R})$, we have $$\int f\,d\nu = \lim_{k \to \infty} \int f \,d\mu_{n_k} = \int f\,d\mu.$$ Now for any open set $U$, we can find a sequence $f_m \in C_c(\mathbb{R})$ with $f_m \to 1_U$ pointwise and boundedly. By dominated convergence, $$\nu(U) = \lim_{m \to \infty} \int f_m\,d\nu = \lim_{m \to \infty} \int f_m\,d\mu = \mu(U).$$ Now use a monotone class argument to show $\nu(B) = \mu(B)$ for all Borel sets. Hence $\nu = \mu$.

Some examples of classes $\mathcal{C}$ satisfying this condition:

  • Compactly supported, piecewise linear functions

  • $C^\infty$ compactly supported functions

  • Lipschitz functions (by either of the above)


Another class $\mathcal{C}$ of functions that works, for different reasons, is $\{e^{itx} : t \in \mathbb{R}\}$. This is Lévy's continuity theorem. I don't think this $\mathcal{C}$ satisfies the previous sufficient condition.


A new question was posed in comments: suppose the probability measures $\mu_n, \mu$ are all absolutely continuous with respect to Lebesgue measure $m$, with densities $h_n, h$. Suppose moreover that they are all supported inside $[0,1]$. If $\mu_n \to \mu$ weakly, does it follow that $\int f\,d\mu_n \to \int f\,d\mu$ for all bounded measurable $f$?

Answer: No, it does not.

Let $E \subset [0,1]$ be a fat Cantor set, or your other favorite set which is closed, nowhere dense, and has positive Lebesgue measure. Let $h = \frac{1}{m(E)} 1_E$. For $1 \le k \le 2^n$, let $I_{n,k}$ denote the open interval $((k-1)2^{-n}, k2^{-n})$. Define $h_n$ by $$f_n = \sum_{k=1}^{2^n} \frac{m(I_{n,k} \cap E)}{m(I_{n,k} \setminus E) m(E)} 1_{I_{n,k} \setminus E}.$$

Note that since $E$ is closed and nowhere dense, for any $n,k$ we have that $I_{n,k} \setminus E$ is open and nonempty; in particular it has positive measure. So this definition makes sense.

Let $d\mu_n = h_n\,dm$, $d\mu = h \,dm$ be the measures with the corresponding densities. Observe that $\mu_n(E) = 0$ for all $n$. If we take $f = 1_E$, which is bounded measurable (and even upper semicontinuous), then clearly $\int f\,d\mu_n = 0$ while $\int f\,d\mu = 1$. Now I claim that $\mu_n \to \mu$ weakly, which will complete the counterexample.

By construction, $\mu_n(I_{n,k}) = \frac{m(I_{n,k} \cap E)}{m(E)} = \mu(I_{n,k})$. Also observe that for $n \le m$ and $1 \le k \le 2^{-n}$ we have $$I_{n,k} = \left(\bigcup_{j=(k-1)2^{m-n}+1}^{k 2^{m-n}} I_{m,j}\right) \cup \text{ finitely many points} \tag{*}$$ and thus $$\mu_m(I_{n,k}) = \sum_j \mu_m(I_{m,j}) = \sum_j \mu(I_{m,j}) = \mu(I_{n,k}) = \mu_n(I_{n,k})$$ where the sums are taken over the same $j$ as in (*).

Let $g : [0,1] \to \mathbb{R}$ be continuous and let $\epsilon > 0$. By uniform continuity, for all sufficiently large $n$ we have that the oscillation of $g$ on every interval of length $2^{-n}$ is less than $\epsilon$. So if we set $$\tilde{g} = \sum_{k=1}^{2^n} g(k 2^{-n}) 1_{I_{n,k}}$$ then $|g - \tilde{g}| < \epsilon$ almost everywhere. Now note that $$\int \tilde{g}\,d\mu_n = \sum_{k=1}^{2^n} g(k 2^{-n}) \mu_n(I_{n,k}) = \sum_{k=1}^{2^n} g(k 2^{-n}) \mu(I_{n,k}) = \int \tilde{g}\,d\mu.$$ Hence $$\left|\int g\,d\mu_n - \int g\,d\mu\right| = \left| \int (g-\tilde{g})\,d\mu_n + \int (\tilde{g}-g)\,d\mu \right| \le 2\epsilon.$$ So $\int g\,d\mu_n \to \int g\,d\mu$ and we have shown weak convergence.

Related Question