[Math] Law of large numbers – almost sure convergence

law-of-large-numbersprobability

Quoting Wikipedia may be considered not a very nice partice, but:

When a fair coin is flipped once, the theoretical probability that the
outcome will be heads is equal to 1/2. Therefore, according to the law
of large numbers, the proportion of heads in a "large" number of coin
flips "should be" roughly 1/2. In particular, the proportion of heads
after n flips will almost surely converge to 1/2 as n approaches
infinity.

My problem with LLN is that it basically says the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

It makes sense, but you can't tell me it's impossible to get tails only in a series of infinite coin tosses. What's worse, there's a very weird terminology used, namely: it converges almost surely. What is it even supposed to mean? This term is pretty fuzzy and sounds like as if I said 'quite probably it converges' – this is supposed to be mathematics, not opinions about likelyhood of an event.

The point is – if something happens with probability equal to $1$, it doesn't mean it will happen. I might be wrong, but I just don't feel comfortable with it. Either something converges or not, and if the probability converges to $1$, then it doesn't tell me anything, becuase other things can happen too. Even if it's equal to $1$.

Best Answer

Before I start, I have to say that @Nate Eldredge is correct that you are trying to grasp some ideas that are several steps beyond your mathematical background. But there are some things to say that will get you started thinking in useful directions.

Events of probability zero. In a sense, I can and will tell you it is 'impossible' to get an infinite run of Heads tossing a coin. The probability of $n$ Heads in a row is $(1/2)^n,$ so your chances of seeing 1000 Heads in a row are $9.332636 \times 10^{-302}.$ So for all practical purposes, even a run of 1000 Heads is not going to happen. The only sensible probability to assign to an infinite run of Heads is 0. By 'sensible probability' I mean it is the only way to have a consistent mathematical system in which the total probability of all conceivable events adds to 1. (As in Kolmogorov's axioms of probability.)

I don't claim you can't conceive of the occurrence of an infinite run of Heads. However, there are very many other conceivable events of probability zero tossing coins: an infinite sequence of Tails, an infinite repetition of HTHTHT..., an infinite repetition of TTHHTTHH..., and so on. If you assign even a tiny positive probability to each of these, I hope you can see how the total probability of very many of them would soon exceed 1.

Convergence 'in probability.' Ordinary mathematical convergence deals with deterministic events. For example, in the expression $\lim_{n \rightarrow \infty} (1 + 2n)/n = 2,$ I know that element $n = 20$ in the sequence is $(1 + 2n)/n = 41/20 = 2.05.$ So it is easy to establish that, with increasing $n,$ the sequence $a_n = (1 + 2n)/n$ gets--and remains--as close to $2$ as we like.

Now let's consider tossing a fair coin. Let $p_n$ be the proportion of Heads among the first $n$ tosses. There is a sense in which the 'limit' of $p_n$ as $n \rightarrow \infty$ is $1/2.$ But it cannot be ordinary mathematical convergence because $p_n$ is a random quantity, not a deterministic one.

The 'cure' is to look at the sequence $B_n = P\{|p_n - 1/2| < \epsilon\}.$ The sequence $B_n$ is a deterministic numerical sequence. For any positive $\epsilon,$ no matter how small, one can compute the probability. And one can show $\lim_{n \rightarrow \infty} B_n = 1.$ This is not the place for a formal proof. (You can look up the Weak Law of Large Numbers.) However, for $\epsilon = 0.01,$ one can compute $B_n$ for increasing values of $n.$ The table below shows a few (rounded to 4 places). For $n = 100,000,$ we get $B_{100,000} \approx 1,$ to about 9 decimal places.

    n        B
  100   0.2356
 1000   0.4933
10000   0.9556

This kind of convergence is called convergence in probability. On writes $\text{plim}_{n \rightarrow} p_n = 1/2$ or $p_n \stackrel{p}{\rightarrow} 1/2.$ From a mathematical point of view it is a relatively easy kind of convergence to discuss because, for each individual value of $n,$ the computation of $B_n$ involves looking only at a binomial distribution with parameters $n$ and $1/2.$

'Almost sure' convergence. The Strong Law of Large Numbers requires a more sophisticated probability structure. It states that $P\{\lim_{n \rightarrow \infty} p_n = 1/2\} = 1,$ also written $p_n \stackrel{a.s.}{\rightarrow} 1/2.$ This formulation of the convergence of a sequence of random variables requires a probability structure that can deal with all sequences $p_n$ simultaneously. In essence, it has to make sense not only of probabilities of realistic events observable in the real world, but at the same time with infinitely many 'conceivable' events of probability zero, including the one you mentioned in your question. ("Almost sure" means "with probability 1." The terminology is intended to exclude 'conceivable' events of probability 0. One can argue whether "almost sure" is the best terminology for the general public to grasp, but it is widely used in advanced probability theory.)

Note: In this discussion I have tried to compromise between precise technical terminology and words that make sense in ordinary English for people who have not studied advanced probability and measure theory. This is not a treatise in measure theory. Nevertheless, I anticipate comments from people who believe they can be simultaneously intelligible to an elementary audience and more technically precise. I would never claim my compromises are optimal.