Weak vs strong law of large numbers

law-of-large-numbersmeasure-theoryprobabilityprobability theory

Let $\{X_n\}_{n \in \mathbb{N}}$ be a sequence of real i.i.d. random variables with mean $\mu$. Let $S_n$ be the sum of the first $n$ elements of this seqeuence,
$$S_n = \frac1n\sum_{i=1}^n X_i.$$
Then the weak law of large numbers states that for any $\epsilon > 0$
$$\lim_{n\rightarrow \infty}P(|S_n – \mu| > \epsilon) = 0 \tag{1}$$
and the strong law of large number states that
$$P(\lim_{n \rightarrow \infty}|S_n – \mu| = 0) = 1 \tag{2}$$

I have read a number of questions regarding these two laws, but I'm still having some trouble seeing the subtle differences between them and what they're saying on a less technical level. I suspect part of my confusion could be related to a few questions I have on convergence in probability vs convergence a.s., which I've posted as a separate question.

My current intuition is as follows. For fixed $\epsilon$ the weak law tells us that we will increasingly be likely to be near $\epsilon$ of $\mu$ as $n \rightarrow \infty$. In other words, if we fix a large $n$ then it is likely that $S_n \in (\mu – \epsilon, \mu + \epsilon)$. However, since the probability is only $0$ in the limit (1), it can be nonzero at this fixed $n$, and so there is a chance that $S_n$ might fall outside of this interval. As $n$ increases, the likelihood of $S_n$ falling outside this range decreases. On Wikipedia they state

The weak law states that for a specified large $n$, the average $\overline{X}_n$ is likely to be near $\mu$. Thus, it leaves open the possibility that $|\overline{X}_n – \mu| > \epsilon$ happens an infinite number of times, although at infrequent intervals.

How did they conclude that $S_n$ might fall outside the range an infinite number of times?

What I am less sure about is what the strong law is exactly saying. For example, if we again choose large $n$, wouldn't we again have nonzero probability that $S_n$ will fall outside of some neighorhood around $\mu$, since (2) only holds in the limit? However on Wikipedia they say

The strong law shows that this (by this they are referring to the passage I quoted above) almost surely will not occur. It does not imply that with probability 1, we have that for any $\epsilon > 0$ the inequality $|\overline{X}_n – \mu|$ holds for large enough $n$, since the convergence is not necessarily uniform on the set where it holds.

But I am not sure how this gives anymore information than the weak law already does. How does it rule out "bad" events occurring infinitely often as in the weak law?

As another example, in Ross's book on probability he says:

The weak law of large numbers states that, for any specified large value $n^*$, $(X_1 + \cdots + X_{n^*})/n^*$ is likely to be near $\mu$. However, it does not say that $(X_1 + \cdots + X_n)/n$ is bound to stay near $\mu$ for all $n$ larger than $n^*$. Thus, it leaves open the possibility that large values of $|(X_1 + \cdots + X_n)/n – \mu|$ can occur infinitely often (though at infrequent intervals). The strong law shows that this cannot occur. In particular, it implies that, with probability 1, for any positive value $\epsilon$, $$\Big| \sum_{i=1}^n \frac{X_i}n – \mu \Big|$$ will be greater than $\epsilon$ only a finite number of times.

But again, I do not see how his last claim follows from convergence a.s. as in (2).

Best Answer

I think a lot of the confusion in settings such as this can occur because the notation often used in statistics does not remind the reader than $S_n$ is not a number. It's a function. We have a probability space $\Omega$, and each $S_n$ is a function $S_n:\Omega\to\mathbb{R}$. For any $\omega\in \Omega$, $S_n(\omega)$ is a sequence of numbers. For different $\omega$, we may get different sequences which may have different behaviors.

Convergence in probability does not say anything about any particular $\omega$. It says something about $\Omega$ overall, roughly that $\mathbb{P}(\{\omega\in\Omega:|S_n(\omega)-\mu|>\epsilon\})$ is small for large $n$ (and, more precisely, thatn for any $\epsilon$, this sequence approaches zero as $n\to\infty$). This is what is meant when it says that $S_n$ is likely to be near $\mu$ for large $n$. But I think that the statement about $n^*$ is (without mentioning it) transitioning to talking what's happening at a specific $\omega$. We can say that fo r $n^*$, $S_{n^*}$ is likely to be near $\mu$ but if $S_{n^*}(\omega)$ is near $\mu$ for some particular $\omega$, it is not necessarily the case that $S_n(\omega)$ is near $\mu$ for this same $\omega$ and later $n^*$. The strong law precludes this because if $\omega\in \Omega$ is such that $\lim_n S_n(\omega)=\mu$, then for any $\epsilon>0$, $\{n:|S_n(\omega)-\mu|>\epsilon\}$ is finite. Note that both of these last two conditions are about a single $\omega$ and are about a sequence of numbers, $S_n(\omega)$, not random variables (functions) $S_n$. Therefore if $\omega$ is such thbat $\{n:|S_n(\omega)-\mu|>\epsilon\}$ is finite, then $S_n(\omega)$ does not converge to $\mu$, and the set of such $\omega$ has probability zero.

Let's imagine what could be happening to get convergence in probability but not convergence a.s. Let $\Omega=[0,1)$ endowed with the usual Lebesgue measure $\mathbb{P}$. Let $F_1^1:\Omega\to \{0,1\}$ be $1$ on the whole interval $[0,1)$. Let $F^2_1,F^2_2:\Omega\to\{0,1\}$ be such that $F^2_1$ is $1$ on the first half of $[0,1)$ and $0$ on the second half. Let $F^2_2$ be $1$ on the second half of $[0,1)$ and $0$ on the first. More generally, let $F^n_1,\ldots,F^n_n:[0,1)\to\{0,1\}$ be such that $F^n_k$ is $1$ on $[(k-1)/n,k/n)$ and zero on the rest of $[0,1)$.

Note that for $0<\epsilon<1$, $$\mathbb{P}(|F^n_k-0|>\epsilon)=1/n,$$ because the set where $|F^n_k-0|>\epsilon$ is $[(k-1)/n,k/n)$. So if $X_1,X_2,\ldots$ is an enumeration of $F^1_1,F^2_1,F^2_2,F^3_1,F^3_2,F^3_3,\ldots$, $X_m$ converges to $0$ in probability. But for any particular point $\omega\in[0,1)$, $X_m(\omega)$ does not converge to $0$. This is because $X_m(\omega)$ will be $1$ for one of the functions $F^1_1$, one of the functions $F^2_1,F^2_2$, one of the functions $F^3_1,F^3_2,F^3_3$, etc. So these "bad sets" $A_m=\{\omega\in\Omega:|X_m-0|>\epsilon\}$ have probabilities going to zero, but they bounce around enough so that each point occurs in infinitely many of the $A_m$ (but it will take longer and longer between $m$ in which a particular $\omega$ appears). More specifically, the $A_m$ seequence is $A_1=[0,1)$, $A_2=[0,1/2)$, $A_3=[1/2,1)$, $A_4=[0,1/3)$, $A_5=[1/3,2/3)$, $A_6=[2/3,1)$, etc. So $\mathbb{P}(\{\omega\in\Omega:\lim_n X_n(\omega)=0\})=0$, not $1$.

If there were a sequence of iid random variables $Y_n$ such that $X_n=(Y_1+\ldots+Y_n)/n$, then this sequence would satisfy the weak law of large numbers but not the strong law, because $X_n$ converges to zero in probability but not a.s. However, there is no such sequence of $Y_i$ (indeed, one way to prove that there can be no such $Y_i$ is exactly because if it did exist, then the sample means $X_n$ would converge a.s. by SLLN).

Related Question