How can one understand almost-sure convergence in a physically meaningful way

educationintuitionprobability theorystochastic-processes

I'm using the "share your knowledge, Q&A-style" feature on StackExchange. I guess it's not used so much on MathSE, but I hope it will be okay, and I would certainly love any feedback or comments (or alternative answers!). Thanks.


Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space. A "random variable" here will always mean an $(\mathcal{F},\mathcal{B}(\mathbb{R}))$-measurable function from $\Omega$ to $\mathbb{R}$. We follow the standard notational convention that for some random variables $X,Y,\text{etc.}$,
$$ \mathbb{P}(\text{statement about } X,Y,\text{etc.}) := \mathbb{P}(\{\omega \in \Omega \, : \, \text{statement about } X(\omega),Y(\omega),\text{etc.}\})\text{.} $$

Two notions of "convergence of random variables" that one learns about in a first course on probability theory are:

Convergence in probability. A sequence of random variables $X_n$ is said to converge in probability to a random variable $X$ if for every $\varepsilon>0$,
$$ \mathbb{P}(X_n \in [X-\varepsilon,X+\varepsilon]) \to 1 \ \text{ as } n \to \infty\text{.} $$

Almost-sure convergence. A sequence of random variables $X_n$ is said to converge almost surely to a random variable $X$ if
$$ \mathbb{P}( X_n \to X \ \text{ as } n \to \infty ) = 1. $$

Now the notion of convergence in probability has a fairly intuitively clear physical interpretation: "if you wait a very long time, then with very high probability you will be very near your target $X$".

By contrast, while it's not hard to follow the technical details of the definition of almost-sure convergence, it is not at all clear how one should understand "physically" the difference between when a sequence of random variables does and does not converge almost surely to some target random variable $X$.

[At least, I, for one, have thought for a long time that "convergence as time tends to $\infty$ according to almost every sample realisation of the universe" – in and of itself – has no physical meaning in a finite-time non-retrialable universe where the purpose of probability theory is, roughly speaking, to provide a physically relevant notion of likelihoods.]

Is there any physically intuitive way of understanding almost sure convergence, or an equivalent (even if technically more complicated!) definition that elucidates a physical intuition more clearly?

Best Answer

Yes, there is a more physically intuitive way of understanding almost-sure convergence!

Let's first return to convergence in probability. Intuitively, this says:

  • If you wait a very long time, then with very high probability, $X_n$ will be very close to $X$.

But now let's ask: what if you want not only $X_n$ but also $X_{n+1},\ldots,X_{n+k}$ (for some $k \in \mathbb{N}$) to be close to $X$? In other words, we are concerned with the problem of:

  • "If I wait a sufficiently long time, do I have that with very high probability, I will not only be very close to the target $X$, but will also subsequently remain close to the target for some a-priori-specified duration?"

It turns out that this is also covered by convergence in probability. Namely, it is an easy exercise to prove the following:

Lemma. Suppose $X_n$ converges in probability to $X$, and fix any $k \in \mathbb{N}_0$. For every $\varepsilon>0$, $$ P_k(\varepsilon,n) := \mathbb{P}(X_n,\ldots,X_{n+k} \in [X-\varepsilon,X+\varepsilon]) \to 1 \ \text{ as } n \to \infty\text{.} $$

In other words, even though the definition of convergence in probability just says that $P_0(\varepsilon,n) \to 1$ as $n \to \infty$ for each $\varepsilon$, it actually follows that $P_k(\varepsilon,n) \to 1$ as $n \to \infty$ for each $\varepsilon$, for every $k \in \mathbb{N}$.

But now, suppose $k$ is not a-priori-specified; instead, I need to be able to make allowance for arbitrarily large $k$. In other words, suppose I want the convergence to be "sufficiently reliable that no matter how large the necessary $k$ might turn out to be, I can control how long I need to wait to have high probability that $X_n,\ldots,X_{n+k}$ are all close to $X$".

For this problem, it turns out that convergence in probability will not be enough. Indeed, if you do the exercise of proving the Lemma and then look at your proof, you will already get the indication that just convergence in probability might not be enough: your bound for how long you have to wait for $P_k(\varepsilon,n)$ to be close to $1$ will be an expression that typically tends to $\infty$ as $k \to \infty$!

This is where "almost-sure convergence" comes in!

Alternative definition of almost-sure convergence. We say that $X_n$ converges almost surely to $X$ if for every $\varepsilon>0$, $$ P_k(\varepsilon,n) \to 1 \ \text{ as } n \to \infty \text{ uniformly across } k \in \mathbb{N}_0\text{.} $$

In other words, "almost-sure convergence" is precisely the stronger version of convergence in probability that makes allowance for arbitrarily large $k$.


Proof that the alternative definition is equivalent to the usual definition. Let $$ Y_{n,k} := \max\{ |X_n-X|, \ldots, |X_{n+k}-X| \}\text{,} $$ and let $$ Y_{n,\infty} := \sup\{|X_n-X|, |X_{n+1}-X|, \ldots\ldots\}\text{.} $$ So for each $n$ and $\omega$, we have $Y_{n,k}(\omega) \nearrow Y_{n,\infty}(\omega)$ as $k \to \infty$. Hence the set $$ E_{n,\infty,\varepsilon} := \{Y_{n,\infty} \leq \varepsilon\} $$ is the intersection over $k \in \mathbb{N}_0$ of the decreasing-in-$k$ sequence of sets $$ E_{n,k,\varepsilon} := \{Y_{n,k} \leq \varepsilon\}\text{.} $$ Thus \begin{align*} \sup_{k \in \mathbb{N}_0} (1-P_k(n,\varepsilon)) &= 1-\mathbb{P}(E_{n,\infty,\varepsilon}) \\ &= 1 - \mathbb{P}(X_n,X_{n+1},\ldots\ldots \in [X-\varepsilon,X+\varepsilon])\text{.} \end{align*} Therefore, our alternative definition is equivalent to saying that for every $\varepsilon>0$, $$ \mathbb{P}(X_n,X_{n+1},\ldots\ldots \in [X-\varepsilon,X+\varepsilon]) \to 1 \ \text{ as } n \to \infty\text{.} $$ It remains to show that this is equivalent to the usual definition. First observe that the set $$ E_{\infty,\infty,\varepsilon} := \{X_i \in [X-\varepsilon,X+\varepsilon] \ \text{ for all sufficiently large } i\} $$ is the union over $n$ of the increasing-in-$n$ sequence of sets $E_{n,\infty,\varepsilon}$, and that the set $$ E_{\infty,\infty,0} := \{X_n \to X \ \text{ as } n \to \infty\} $$ is the intersection over $\varepsilon>0$ of the decreasing-in-$\varepsilon$ sequence of sets $E_{\infty,\infty,\varepsilon}$. Hence $$ \mathbb{P}(E_{\infty,\infty,0}) = \lim_{\downarrow}{}_{\!\varepsilon \to 0} \ \lim_{\uparrow}{}_{\!n \to \infty} \ \mathbb{P}(E_{n,\infty,\varepsilon})\text{.} $$ It follows that $\mathbb{P}(E_{\infty,\infty,0})=1$ if and only if for every $\varepsilon>0$, $\lim_{n \to \infty} \mathbb{P}(E_{n,\infty,\varepsilon})=1$. $\quad\square$

Related Question