I roll a pair of fair dice $n$ times, and calculate the sum of all $2n$ faces which come up:
- Suppose each roll of each die is independent of other rolls.
- How many rolls are sufficient to ensure, with probability $99\ \%$, that the sum is greater than $100\ ?$.
I've calculated that the expected value and variance of sum are
$$
\mathbb{E}\left(X\right) = 7n\quad\mbox{and}\quad \mathrm{Var}(X)=\frac{35}{6}n
$$
where $X$ represents the random variable of sum of the $2n$ faces.
I think I need to find a lower bound on $\operatorname{Pr}\left(X > 100\right)$. I've attempted applying Markov's inequality:
$$
\operatorname{Pr}\left(X \geq 100\right)\leq \frac{\mathbb{E}(X)}{100}\,,
$$
which isn't anywhere near what I want. Any tips $?$. This may be a very basic question, but I'm not good at probability at all.
Best Answer
Let $X_1,...,X_k$ be i.i.d. Uniform($\{1,...,m\}).$ Then the distribution of $S_k=X_1+...+X_k$ is given exactly by a recursion proved in this paper$^\dagger$, which expresses this distribution in terms of polynomial coefficients. However, an approximation based on the CLT, in our case of $m=6$, is that $S_k$ is approximately Normal with mean $\mu_k={7 k\over 2}$ and standard deviation $\sigma_k=\sqrt{35 k\over 12}$; so $$P\left(S_k\gt 100\right)\approx 1-\Phi\left({100-\mu_k\over \sigma_k }\right).$$
Here's a picture, for $k=2n$:
The least $n$ that gives $P\left(S_{2n}\gt 100\right)\ge 0.99$ is $n=18$, according to both approaches.
Here I've posted my Python code implementing both the exact and approximate approaches, with the following numerical output:
$^\dagger$ Caiado, Camila & Rathie, Pushpa. (2007). "Polynomial coefficients and distribution of the sum of discrete uniform variables"
Here is a quick summary of how $P(S_k=s)$, and hence $P(S_k>100)$, is related to polynomial coefficients via probability generating functions (PGFs):
The PGF for each $X_i\sim$ Uniform({$1,...,m$}) is just $$G(t):=\sum_{x}P(X=x)t^x={1\over m}(t^1+t^2+...+t^m)={1\over m}t(1+t+...+t^{m-1})$$ so, by the product rule for the PGF of a sum of independent r.v.s, the PGF for $S_k$ is simply $$\sum_{s}P(S_k=s) t^s = G(t)^k = {1\over m^k}t^k(1+t+...+t^{m-1})^k.$$ Therefore $$P(S_k=s) = {1\over m^k} \times\text{coefficient of $t^s$ in $t^k(1+t+...+t^{m-1})^k$}\\ ={1\over m^k} \times\text{coefficient of $t^{s-k}$ in $(1+t+...+t^{m-1})^k$}$$ which are precisely the coefficients for which a recursion is provided in the cited paper.