$95\%$ confidence for repeated bernoulli trials

binomial distributionprobability

For a known success probability $p$ of an event $X_p$, I know that the expected number of Bernoulli trials for the outcome to occur at least once is $1/p$.1 For example, the expected number of die rolls to roll a 1 on a 4 sided die is $\frac{1}{\frac{1}{4}}=4$.

From some experimentation, I empirically have figured out that the probability of this outcome actually occurring tends to be ~$68\%$. In other words, the chance that at least one 1 to be rolled on a 4 sided die after the expected number of trials, (four, from above) is $\frac{1}{4} + \frac{1}{4} \cdot \frac{3}{4} + \frac{1}{4} \cdot \frac{3}{4} \cdot \frac{3}{4} + \frac{1}{4} \cdot \frac{3}{4} \cdot \frac{3}{4} \cdot \frac{3}{4} = 0.68359375$ which suspiciously looks like the amount of population that is 1 standard deviation away from the mean in a normal distribution.

I therefore guess that figuring out the number of trials required for $95%$ confidence that the outcome has occurred at least once and $99.5%$ confidence that it has occurred at least once should be fairly easy, but this is where my ability to figure this out myself ends.

For a given known probability $p$ and a confidence probability $c$, what is the required number of trials of the event $X_p$ for the probability to be at least $c$ that the event has occurred at least once?

My attempts to search for this keep turning up stuff about confidence intervals, which appears to be about the confidence that a probability is some value. I know the probability in the die example is $\frac{1}{4}$ already.

1Because $E(X_p) = \sum_{x=1}^\infty x \cdot p \cdot (1-p)^{x-1} = \frac{1}{p} $ (for $ 0 < p < 1 $)

Best Answer

You are interested in calculating $$\begin{align} \Pr[X \le \operatorname{E}[X]] &= \sum_{x=1}^{\lfloor \operatorname{E}[X] \rfloor} p (1-p)^{x-1} = 1 - (1-p)^{\lfloor \operatorname{E}[X]\rfloor} = 1 - (1-p)^{\lfloor 1/p \rfloor} \end{align}$$ where $\operatorname{E}[X] = 1/p$ is the expected number of Bernoulli trials until the first success. Hence the exact probability in the case where $p = 0.25$ is $\Pr[X \le 4] = \frac{175}{256}$.

However, the fact that the resulting probability is a nontrivial function of the Bernoulli trial success probability $p$ means that you cannot infer that there is a relationship between this result and the cumulative distribution of a normal random variable. Any such coincidences are just that--a coincidence. The value $$\Pr[|Z| \le 1] \approx 0.682689$$ where $Z \sim \operatorname{Normal}(0,1)$ is "close to" but not equal to the aforementioned probability.

It is worth showing the plot of $1 - (1-p)^{\lfloor 1/p \rfloor}$ (blue curve):

enter image description here

The orange curve is $1 - (1-p)^{1/p}$. The limiting value as $p \to 0$ is easily found by noting that $$\lim_{p \to 0^+} 1 - (1-p)^{1/p} = \lim_{x \to \infty} 1 - \left(1 - \frac{1}{x}\right)^x = 1 - \frac{1}{e} \approx 0.632121.$$

As to your actual question, you want to know the smallest value $x^*$ such that $$\Pr[X \le x^*] \ge c$$ for a given $p$ and $c$, both in $(0,1)$. This is simply $$1 - (1-p)^{x^*} \ge c,$$ or $$x^* \ge \frac{\log (1-c)}{\log (1-p)},$$ hence the minimum such $x^*$ is $$x^* = \left\lceil \frac{\log (1-c)}{\log (1-p)} \right\rceil.$$

Related Question