Why are you less likely to roll at least 1/6 of the dice as 6 when the number of dice increases

probabilityprobability theorystatistical-inferencestatistics

So, I recently watched a V-Sauce video discussing a collaboration between Sir Isaac Newton and Samuel Pepys on a probability problem regarding the probability of rolling at least one six on six six-sided dice, compared to the probability of rolling at least 2 sixes on 12 dice or 3 sixes on 18 dice. The answer they arrived at was that the probability of rolling (N/6) 6s on N dice decreased as X increases – the odds are .6651 for 6 dice, .6187 for 12 dice, and .5973 for 18 dice.

However, this seems counterintuitive to me – as the sample size of a sample grows, the more likely it is to conform to the true probability, right? That's the basis of a lot of the statistical analysis that underlies things like p-values. Why don't the probabilities increase to reflect this, instead of decreasing the way they actually do?

Best Answer

Let's use a simpler example: tossing a fair coin. As the coin is fair, the probability of heads equals the probability of tails for any single trial, and is $p = 1/2$.

Now, if we toss the coin $10$ times, what is the probability of exactly $5$ heads and $5$ tails? It is $$\frac{63}{256} \approx 0.246094.$$ But if we toss the coin $11$ times, is it possible to get an equal number of heads and tails? No, because there is no such thing as getting $5.5$ heads. You argue that it is only fair we limit ourselves to an even number of coin tosses when considering the probability of getting equal heads and tails.

All right then, so say there are $2n$ coin tosses and we want to understand the probability of getting exactly $n$ heads and $n$ tails each. In the general case, this probability is $$p(n) = \Pr[H = T] = \binom{2n}{n} 2^{-2n} = \frac{(2n)!}{(n!)^2 2^{2n}}.$$ We can mathematically show this is a strictly decreasing function of $n$, e.g. $$\begin{align*} \frac{p(n+1)}{p(n)} &= \frac{(2n+2)!}{(2n)!} \left(\frac{n!}{(n+1)!}\right)^2 \frac{2^{2n}}{2^{2n+2}} \\ &= \frac{(2n+2)(2n+1)}{4(n+1)^2} \\ &= \frac{2n+1}{2n+2} \\ &= 1 - \frac{1}{2n+2} \\ &< 1, \end{align*}$$ so it follows that $$p(n+1) < p(n)$$ for all positive integer $n$. But while interesting, this calculation doesn't really get to the heart of the reason why this is happening.

Let's think of the situation like this. Instead of asking for the probability that we get exactly the same number of heads and tails, consider the probability that we get some proportion of heads versus tails that falls within a certain margin. After all, if we flip the coin $10$ times and got exactly $5$ each of heads and tails, then would it be fair to say that in $100$ flips, we must get $50$ of each, or is it more reasonable to allow some margin to accommodate the fact that the next integral outcome of heads versus tails in the $10$ flip example is $(4,6)$ or $(6,4)$? It seems reasonable to allow us to include the probability of getting anywhere between $45$ and $55$ heads, not just exactly $50$. If $X$ is the number of heads in $100$ tosses, this probability is $$\Pr[45 \le X \le 55] = \frac{1}{2^{100}} \sum_{x=45}^{55} \binom{100}{x} = \frac{28868641920228451421269389993}{39614081257132168796771975168} \approx 0.728747.$$ This is bigger than the probability we computed for exactly $5$ of each in $10$ flips. And if we use the same reasoning for $1000$ tosses, we have $$\Pr[450 \le X \le 550] \approx 0.998608.$$ So as you might guess by now, the issue here is that when we increase the number of trials, the probability of any single outcome tends to decrease because there are more possible outcomes. And that is why we see a decrease in the probability of getting exactly $n$ heads and tails each out of $2n$ trials. If we think graphically, imagine the number of heads as a proportion of the total number of trials plotted horizontally, and the probability plotted vertically. Here's what the plot looks like for increasing $n$:

enter image description here

This animation is shown with a fixed vertical axis, so you can see that each possible outcome gets less and less likely, but the probability becomes increasingly concentrated at the middle. If we rescale the vertical axis to adjust for the maximum probability for any outcome, we get

enter image description here

This shows how the peak gets narrower with increasing $n$. So if we want the probability at the peak, you can see how the first animation shows it gets smaller. But if you want the probability that the outcomes fall within a fixed range of proportions (i.e., within a pre-specified vertical strip) about the center, it increases because the "bump" is getting narrower. It is in this sense that, with increasing number of flips, the outcome tends toward the true proportion $1/2$.

Related Question