Probability – Is the Law of Large Numbers Empirically Proven?

applicationslaw-of-large-numbersprobabilitystatistics

Does this reflect the real world and what is the empirical evidence behind this?

Wikipedia illustration

Layman here so please avoid abstract math in your response.

The Law of Large Numbers states that the average of the results from multiple trials will tend to converge to its expected value (e.g. 0.5 in a coin toss experiment) as the sample size increases. The way I understand it, while the first 10 coin tosses may result in an average closer to 0 or 1 rather than 0.5, after 1000 tosses a statistician would expect the average to be very close to 0.5 and definitely 0.5 with an infinite number of trials.

Given that a coin has no memory and each coin toss is independent, what physical laws would determine that the average of all trials will eventually reach 0.5. More specifically, why does a statistician believe that a random event with 2 possible outcomes will have a close to equal amount of both outcomes over say 10,000 trials? What prevents the coin to fall 9900 times on heads instead of 5200?

Finally, since gambling and insurance institutions rely on such expectations, are there any experiments that have conclusively shown the validity of the LLN in the real world?

EDIT: I do differentiate between the LLN and the Gambler's fallacy. My question is NOT if or why any specific outcome or series of outcomes become more likely with more trials–that's obviously false–but why the mean of all outcomes tends toward the expected value?

FURTHER EDIT: LLN seems to rely on two assumptions in order to work:

The universe is indifferent towards the result of any one trial, because each outcome is equally likely
The universe is NOT indifferent towards any one particular outcome coming up too frequently and dominating the rest.

Obviously, we as humans would label 50/50 or a similar distribution of a coin toss experiment "random", but if heads or tails turns out to be say 60-70% after thousands of trials, we would suspect there is something wrong with the coin and it isn't fair. Thus, if the universe is truly indifferent towards the average of large samples, there is no way we can have true randomness and consistent predictions–there will always be a suspicion of bias unless the total distribution is not somehow kept in check by something that preserves the relative frequencies.

Why is the universe NOT indifferent towards big samples of coin tosses? What is the objective reason for this phenomenon?

NOTE: A good explanation would not be circular: justifying probability with probabilistic assumptions (e.g. "it's just more likely"). Please check your answers, as most of them fall into this trap.

Best Answer

Reading between the lines, it sounds like you are committing the fallacy of the layman interpretation of the "law of averages": that if a coin comes up heads 10 times in a row, then it needs to come up tails more often from then on, in order to balance out that initial asymmetry.

The real point is that no divine presence needs to take corrective action in order for the average to stabilize. The simple reason is attenuation: once you've tossed the coin another 1000 times, the effect of those initial 10 heads has been diluted to mean almost nothing. What used to look like 100% heads is now a small blip only strong enough to move the needle from 50% to 51%.

Now combine this observation with the easily verified fact that 9900 out of 10000 heads is simply a less common combination than 5000 out of 10000. The reason for that is combinatorial: there is simply less freedom in hitting an extreme target than a moderate one.

To take a tractable example, suppose I ask you to flip a coin 4 times and get 4 heads. If you've flip tails even once, you've failed. But if instead I ask you to aim for 2 heads, you still have options (albeit slimmer) no matter how the first two flips turn out. Numerically we can see that 2 out of 4 can be achieved in 6 ways: HHTT, HTHT, HTTH, THHT, THTH, TTHH. But the 4 out of 4 goal can be achieved in only one way: HHHH. If you work out the numbers for 9900 out of 10000 versus 5000 out of 10000 (or any specific number in that neighbourhood), that disparity becomes truly immense.

To summarize: it takes no conscious effort to get an empirical average to tend towards its expected value. In fact it would be fair to think in the exact opposite terms: the effect that requires conscious effort is forcing the empirical average to stray from its expectation.

Related Solutions

[Math] Gambler’s fallacy and the Law of large numbers

Any sequence has the same probability as any other, but there are more sequences that are "balanced" than any other given proportion. For example, if I flip a coin 4 times then there are 6 ways to get 2 heads and 2 tails. There's only one way to get all heads though.

The Gambler's Fallacy compares individual sequences (for instance, the sequences HHHHH and HHHHT).

The LLN talks about groups of sequences - it says which groups your result is more likely to fall into.

[Math] Law of large numbers – almost sure convergence

Before I start, I have to say that @Nate Eldredge is correct that you are trying to grasp some ideas that are several steps beyond your mathematical background. But there are some things to say that will get you started thinking in useful directions.

Events of probability zero. In a sense, I can and will tell you it is 'impossible' to get an infinite run of Heads tossing a coin. The probability of $n$ Heads in a row is $(1/2)^n,$ so your chances of seeing 1000 Heads in a row are $9.332636 \times 10^{-302}.$ So for all practical purposes, even a run of 1000 Heads is not going to happen. The only sensible probability to assign to an infinite run of Heads is 0. By 'sensible probability' I mean it is the only way to have a consistent mathematical system in which the total probability of all conceivable events adds to 1. (As in Kolmogorov's axioms of probability.)

I don't claim you can't conceive of the occurrence of an infinite run of Heads. However, there are very many other conceivable events of probability zero tossing coins: an infinite sequence of Tails, an infinite repetition of HTHTHT..., an infinite repetition of TTHHTTHH..., and so on. If you assign even a tiny positive probability to each of these, I hope you can see how the total probability of very many of them would soon exceed 1.

Convergence 'in probability.' Ordinary mathematical convergence deals with deterministic events. For example, in the expression $\lim_{n \rightarrow \infty} (1 + 2n)/n = 2,$ I know that element $n = 20$ in the sequence is $(1 + 2n)/n = 41/20 = 2.05.$ So it is easy to establish that, with increasing $n,$ the sequence $a_n = (1 + 2n)/n$ gets--and remains--as close to $2$ as we like.

Now let's consider tossing a fair coin. Let $p_n$ be the proportion of Heads among the first $n$ tosses. There is a sense in which the 'limit' of $p_n$ as $n \rightarrow \infty$ is $1/2.$ But it cannot be ordinary mathematical convergence because $p_n$ is a random quantity, not a deterministic one.

The 'cure' is to look at the sequence $B_n = P\{|p_n - 1/2| < \epsilon\}.$ The sequence $B_n$ is a deterministic numerical sequence. For any positive $\epsilon,$ no matter how small, one can compute the probability. And one can show $\lim_{n \rightarrow \infty} B_n = 1.$ This is not the place for a formal proof. (You can look up the Weak Law of Large Numbers.) However, for $\epsilon = 0.01,$ one can compute $B_n$ for increasing values of $n.$ The table below shows a few (rounded to 4 places). For $n = 100,000,$ we get $B_{100,000} \approx 1,$ to about 9 decimal places.

    n        B
  100   0.2356
 1000   0.4933
10000   0.9556

This kind of convergence is called convergence in probability. On writes $\text{plim}_{n \rightarrow} p_n = 1/2$ or $p_n \stackrel{p}{\rightarrow} 1/2.$ From a mathematical point of view it is a relatively easy kind of convergence to discuss because, for each individual value of $n,$ the computation of $B_n$ involves looking only at a binomial distribution with parameters $n$ and $1/2.$

'Almost sure' convergence. The Strong Law of Large Numbers requires a more sophisticated probability structure. It states that $P\{\lim_{n \rightarrow \infty} p_n = 1/2\} = 1,$ also written $p_n \stackrel{a.s.}{\rightarrow} 1/2.$ This formulation of the convergence of a sequence of random variables requires a probability structure that can deal with all sequences $p_n$ simultaneously. In essence, it has to make sense not only of probabilities of realistic events observable in the real world, but at the same time with infinitely many 'conceivable' events of probability zero, including the one you mentioned in your question. ("Almost sure" means "with probability 1." The terminology is intended to exclude 'conceivable' events of probability 0. One can argue whether "almost sure" is the best terminology for the general public to grasp, but it is widely used in advanced probability theory.)

Note: In this discussion I have tried to compromise between precise technical terminology and words that make sense in ordinary English for people who have not studied advanced probability and measure theory. This is not a treatise in measure theory. Nevertheless, I anticipate comments from people who believe they can be simultaneously intelligible to an elementary audience and more technically precise. I would never claim my compromises are optimal.

Does this reflect the real world and what is the empirical evidence behind this?

Best Answer

Related Solutions

[Math] Gambler’s fallacy and the Law of large numbers

[Math] Law of large numbers – almost sure convergence

Related Question