Understanding the Law(s) of Large Numbers

probabilityprobability theory

Theorem $1$: Weak law of Large Numbers

Let $X_1,X_2,…$ be sequence of i.i.d.s each having mean $E[X_i]=\mu$. Then, for any $\epsilon >0,$

$$\lim_{n \rightarrow \infty}P\left(\left\lvert \frac{X_1+X_2+…+X_n}{n}- \mu\right\rvert > \epsilon \right ) = 0.$$

Question $1$: From what I understand, the WLLN asserts that for $n$ large, $\frac{X_1+X_2+…+X_n}{n}$ is likely to be near $\mu$. Am I thus right in claiming that no matter how large $n$ is, there is no guarantee that $\frac{X_1+X_2+…+X_n}{n}$ is always going to stay near $\mu?$ More formally, can I say that there is always a non-zero probability (albeit a very small one) such that we have $\left\lvert \frac{X_1+X_2+…+X_n}{n}- \mu \right\rvert > \epsilon $ no matter what values of $n$ and $\epsilon$ we pick?

Theorem $2$: Strong law of Large Numbers

Let $X_1,X_2,…$ be sequence of i.i.d.s each having mean $E[X_i]=\mu$. Then, with probability $1$, we have that:

$$P\left(\lim_{n \rightarrow \infty} \frac{X_1+X_2+…+X_n}{n} = \mu\right)=1.$$

Question $2$: I understand that, for the Strong law of Large Numbers, we are considering almost-sure convergence. Hence, unlike the weak law, the strong law asserts that the event $\left\lvert \frac{X_1+X_2+…+X_n}{n}- \mu \right\rvert > \epsilon $ almost surely does not happen. Indeed, for all sufficiently large values of $n$, the inequality $\left\lvert \frac{X_1+X_2+…+X_n}{n}- \mu \right\rvert < \epsilon$ always holds (I believe this follows from the epsilon-delta definition). Is this interpretation correct?

Best Answer

Regarding Question 1: Yes, the probability of the event $\{ |\frac{X_1+\cdots + X_n}{n} - \mu | > \epsilon \}$ might be nonzero for any fixed $n$. Think of the following example: $X_1,...,X_n \stackrel{iid}{\sim} N(0,1)$. Then the probability of interest can be calculated exactly, since $\frac{X_1+\cdots + X_n}{n} \sim N(0, 1/n)$, and $P\{ |\frac{X_1+\cdots + X_n}{n} | > \epsilon \}= 2\Phi(-\epsilon \sqrt n)$, which converges to zero as $n$ goes to infinity, but is strictly positive for all $n$. Depending on what you assume about the random variables, you might be able to get a better description of how fast this convergence takes place. For example, if the $X_i's$ have a finite variance, then $P\{ |\frac{X_1+\cdots + X_n}{n} -\mu | > \epsilon \} \le Var(X_i)/(n\epsilon)$, which gives you an idea of how fast this probability decreases. This is called Chebyshev's inequality, and it can be improved, AKA the speed this upper bound approaches zero as a function of $n$ increased, if the $X_i's$ have higher order moments, are Gaussian (or subgaussian), etc..

Regarding Question 2: The same example implies $P\{ |\frac{X_1+\cdots + X_n}{n} - \mu | < \epsilon \}$ may not ever be equal to 1 for any $n$. That is not really what the strong law is guaranteeing. To really understand the strong law, you have to remember that random variables are just (measurable) functions mapping a probability space $(\Omega, \mathcal{F},P)$ to the real numbers. For every point of the sample space $\omega \in \Omega$, $|\frac{X_1(\omega)+\cdots + X_n(\omega)}{n} - \mu |$ defines a sequence of real numbers indexed by $n$. The strong law states that if you look at the subset $A \subset \Omega$ where $\omega \in A \iff |\frac{X_1(\omega)+\cdots + X_n(\omega)}{n} - \mu | \to 0, \;\; as \;\; n\to \infty$, then $P(A)=1$. In other words, the set of exceptional points in $\Omega$ where the functions $\frac{X_1(\omega)+\cdots + X_n(\omega)}{n}$ does not converge to $\mu$ must have probability zero. It is a worthwhile exercise to think about, and prove rigorously, why this is a stronger statement than the weak law, although normally this would not be taught until you are taking a course on measure based probability. See this post if your are interested Strong law of large numbers implies weak law

Related Question