Probability – How Many Rolls Ensure Sum Greater Than 100 with 99% Probability?

diceexpected valueprobabilityrandom variablesvariance

I roll a pair of fair dice $n$ times, and calculate the sum of all $2n$ faces which come up:

  • Suppose each roll of each die is independent of other rolls.
  • How many rolls are sufficient to ensure, with probability $99\ \%$, that the sum is greater than $100\ ?$.

I've calculated that the expected value and variance of sum are
\mathbb{E}\left(X\right) = 7n\quad\mbox{and}\quad \mathrm{Var}(X)=\frac{35}{6}n

where $X$ represents the random variable of sum of the $2n$ faces.

I think I need to find a lower bound on $\operatorname{Pr}\left(X > 100\right)$. I've attempted applying Markov's inequality:
\operatorname{Pr}\left(X \geq 100\right)\leq \frac{\mathbb{E}(X)}{100}\,,

which isn't anywhere near what I want. Any tips $?$. This may be a very basic question, but I'm not good at probability at all.

Best Answer

Let $X_1,...,X_k$ be i.i.d. Uniform($\{1,...,m\}).$ Then the distribution of $S_k=X_1+...+X_k$ is given exactly by a recursion proved in this paper$^\dagger$, which expresses this distribution in terms of polynomial coefficients. However, an approximation based on the CLT, in our case of $m=6$, is that $S_k$ is approximately Normal with mean $\mu_k={7 k\over 2}$ and standard deviation $\sigma_k=\sqrt{35 k\over 12}$; so $$P\left(S_k\gt 100\right)\approx 1-\Phi\left({100-\mu_k\over \sigma_k }\right).$$

Here's a picture, for $k=2n$:

plots of exact and approximate P(S_{2n}>100) vs. n

The least $n$ that gives $P\left(S_{2n}\gt 100\right)\ge 0.99$ is $n=18$, according to both approaches.

Here I've posted my Python code implementing both the exact and approximate approaches, with the following numerical output:

    n=15  P(S>100)=0.6840  normal_approx=0.7035
    n=16  P(S>100)=0.8825  normal_approx=0.8929
    n=17  P(S>100)=0.9685  normal_approx=0.9718
--> n=18  P(S>100)=0.9938  normal_approx=0.9944 <--
    n=19  P(S>100)=0.9991  normal_approx=0.9991
    n=20  P(S>100)=0.9999  normal_approx=0.9999

$^\dagger$ Caiado, Camila & Rathie, Pushpa. (2007). "Polynomial coefficients and distribution of the sum of discrete uniform variables"

Here is a quick summary of how $P(S_k=s)$, and hence $P(S_k>100)$, is related to polynomial coefficients via probability generating functions (PGFs):

The PGF for each $X_i\sim$ Uniform({$1,...,m$}) is just $$G(t):=\sum_{x}P(X=x)t^x={1\over m}(t^1+t^2+...+t^m)={1\over m}t(1+t+...+t^{m-1})$$ so, by the product rule for the PGF of a sum of independent r.v.s, the PGF for $S_k$ is simply $$\sum_{s}P(S_k=s) t^s = G(t)^k = {1\over m^k}t^k(1+t+...+t^{m-1})^k.$$ Therefore $$P(S_k=s) = {1\over m^k} \times\text{coefficient of $t^s$ in $t^k(1+t+...+t^{m-1})^k$}\\ ={1\over m^k} \times\text{coefficient of $t^{s-k}$ in $(1+t+...+t^{m-1})^k$}$$ which are precisely the coefficients for which a recursion is provided in the cited paper.

Related Question