[Math] The law of large numbers: How does the convergence take place? How does the “remainder” look like

convergence-divergenceprobabilityresidue-calculus

I have the independent and identically distributed random variables $X_1,X_2,\ldots$ with a finite expectation $\mu$. I also have defined $S_n = X_1 + \cdots + X_n$.

According to the law of large numbers, I already know that

$S_n/n \to \mu$ almost surely as $n\to\infty$.

However, my question is: How does the convergence take place? What is the “shape” of this convergence? For example, for a given $N>0$, how close is $S_N/N$ to $\mu$? How does the difference, or remainder/residual, look like?

Are there any results, or theorems, available that tell me these kinds of things?

Best Answer

Sometimes a weak law of large numbers gives you better quantitative information.

As a very simple example, suppose that the $X_i$ have finite variance $\sigma^2$. Then it is easy to see that $\operatorname{Var}(\frac{S_n}{n}) = \frac{\sigma^2}{n}$, and Chebyshev's inequality gives that for any $\epsilon > 0$, we have $$P\left(\left|\frac{S_n}{n}-\mu\right| > \epsilon\right) \le \frac{\sigma^2}{n \epsilon^2}.$$

Related Solutions

[Math] Weak law and strong law of large numbers and modes of convergence

Th fact that the strong law of large numbers implies the weak law of large numbers in contains in the following property:

If $(\Omega,\mathcal F,\Bbb P)$ is a probability space and $\{X_n\}$ a sequence of real valued random variables with converges almost everywhere to $X$, it converges in probability to $X$.

To see that, we can consider $X_n\to 0$ and $X_n\geq 0$, replacing $X_n$ by $|X_n-X|$ if necessary. The set $$C:=\bigcap_{p\geq 1}\bigcup_{n\geq 1}\bigcap_{k\geq n}\{\omega\in\Omega,X_k\leq \frac 1p\},$$ has by hypothesis, measure $1$. Fix $\varepsilon>0$ and $p_0$ with $p_0^{-1}\leq \varepsilon$. We have \begin{align} \limsup_k\Bbb P\{X_k\geq \varepsilon\}&\leq \limsup_k\Bbb P\{X_k\geq p_0^{-1}\}\\ &\leq\inf_{n\in\Bbb N}\Bbb P\left(\bigcup_{k\geq n}\{X_k\geq p_0^{—1}\}\right)\\ &\leq \Bbb P\left(\bigcap_{n\in\Bbb N}\bigcup_{k\geq n}\{X_k\geq p_0^{—1}\}\right)\\ &\leq \Bbb P(\Omega\setminus C)=0, \end{align}
which gives the wanted result.

[Math] Law of large numbers – almost sure convergence

Before I start, I have to say that @Nate Eldredge is correct that you are trying to grasp some ideas that are several steps beyond your mathematical background. But there are some things to say that will get you started thinking in useful directions.

Events of probability zero. In a sense, I can and will tell you it is 'impossible' to get an infinite run of Heads tossing a coin. The probability of $n$ Heads in a row is $(1/2)^n,$ so your chances of seeing 1000 Heads in a row are $9.332636 \times 10^{-302}.$ So for all practical purposes, even a run of 1000 Heads is not going to happen. The only sensible probability to assign to an infinite run of Heads is 0. By 'sensible probability' I mean it is the only way to have a consistent mathematical system in which the total probability of all conceivable events adds to 1. (As in Kolmogorov's axioms of probability.)

I don't claim you can't conceive of the occurrence of an infinite run of Heads. However, there are very many other conceivable events of probability zero tossing coins: an infinite sequence of Tails, an infinite repetition of HTHTHT..., an infinite repetition of TTHHTTHH..., and so on. If you assign even a tiny positive probability to each of these, I hope you can see how the total probability of very many of them would soon exceed 1.

Convergence 'in probability.' Ordinary mathematical convergence deals with deterministic events. For example, in the expression $\lim_{n \rightarrow \infty} (1 + 2n)/n = 2,$ I know that element $n = 20$ in the sequence is $(1 + 2n)/n = 41/20 = 2.05.$ So it is easy to establish that, with increasing $n,$ the sequence $a_n = (1 + 2n)/n$ gets--and remains--as close to $2$ as we like.

Now let's consider tossing a fair coin. Let $p_n$ be the proportion of Heads among the first $n$ tosses. There is a sense in which the 'limit' of $p_n$ as $n \rightarrow \infty$ is $1/2.$ But it cannot be ordinary mathematical convergence because $p_n$ is a random quantity, not a deterministic one.

The 'cure' is to look at the sequence $B_n = P\{|p_n - 1/2| < \epsilon\}.$ The sequence $B_n$ is a deterministic numerical sequence. For any positive $\epsilon,$ no matter how small, one can compute the probability. And one can show $\lim_{n \rightarrow \infty} B_n = 1.$ This is not the place for a formal proof. (You can look up the Weak Law of Large Numbers.) However, for $\epsilon = 0.01,$ one can compute $B_n$ for increasing values of $n.$ The table below shows a few (rounded to 4 places). For $n = 100,000,$ we get $B_{100,000} \approx 1,$ to about 9 decimal places.

    n        B
  100   0.2356
 1000   0.4933
10000   0.9556

This kind of convergence is called convergence in probability. On writes $\text{plim}_{n \rightarrow} p_n = 1/2$ or $p_n \stackrel{p}{\rightarrow} 1/2.$ From a mathematical point of view it is a relatively easy kind of convergence to discuss because, for each individual value of $n,$ the computation of $B_n$ involves looking only at a binomial distribution with parameters $n$ and $1/2.$

'Almost sure' convergence. The Strong Law of Large Numbers requires a more sophisticated probability structure. It states that $P\{\lim_{n \rightarrow \infty} p_n = 1/2\} = 1,$ also written $p_n \stackrel{a.s.}{\rightarrow} 1/2.$ This formulation of the convergence of a sequence of random variables requires a probability structure that can deal with all sequences $p_n$ simultaneously. In essence, it has to make sense not only of probabilities of realistic events observable in the real world, but at the same time with infinitely many 'conceivable' events of probability zero, including the one you mentioned in your question. ("Almost sure" means "with probability 1." The terminology is intended to exclude 'conceivable' events of probability 0. One can argue whether "almost sure" is the best terminology for the general public to grasp, but it is widely used in advanced probability theory.)

Note: In this discussion I have tried to compromise between precise technical terminology and words that make sense in ordinary English for people who have not studied advanced probability and measure theory. This is not a treatise in measure theory. Nevertheless, I anticipate comments from people who believe they can be simultaneously intelligible to an elementary audience and more technically precise. I would never claim my compromises are optimal.

Best Answer

Related Solutions

[Math] Weak law and strong law of large numbers and modes of convergence

[Math] Law of large numbers – almost sure convergence

Related Question