Probability of needing to flip an unfair coin at least X times if I want Y heads

binomial distributionprobability

If I know the probability of success for a set of independent events, as well as the number of successes I want (i.e. the number of slots I need to fill), how do I compute the probability of needing at least a certain number of trials to fill up all my slots if I don't know the total number of trials? Is a binomial distribution the correct approach here?

As an example, let's say I have an unfair coin and the probability of landing heads is $p=0.4$. What's the probability that I'll need to flip this coin at least 7 times to get 3 heads?

Best Answer

The Negative Binomial Distribution is the distribution you are looking for here. The negative binomial distribution is a generalization of the geometric distribution, which models the number of trials before a single success is attained. It takes two parameters: $p$, the probability of success in each independent trial, and $k$, the number of desired successes. In your case, the probability that you will need to flip more than $x$ coins with probability of success $p_0$ to get $k_0$ heads is $1 - F(x)$, where $F$ is the cumulative distribution function of a negative binomial distribution with $p = p_0$ and $k = k_0$.

Here's another way of looking at it. If you need to flip more than $x$ coins with probability of success $p_0$ to get $k_0$ heads (successes), then in your first $x$ flips you necessarily got less than $k_0$ successes. The number of successes $j$ you got in your first $x$ flips thus ranges from $0$ to $k_0 - 1$. For a given $j$ in this range, the probability that you got exactly $j$ successes in your first $x$ flips has probability $${x \choose j} p^j (1-p)^{x-j}$$ And therefore, the probability that you will need to flip more than $x$ coins is a summation over all possible $j$, i.e. $$\mathbb{P}(\text{need more than } x \text{ flips for } k_0 \text{ successes}) = \sum_{j = 0}^{k_0 - 1} {x \choose j} p^j (1-p)^{x-j}$$ Unfortunately, there is no significantly nicer way to write the above expression. (One can write it using the incomplete beta function, but it's not much nicer.)