Calculating the probability of $x$ number of successes in $n$ trials where each success reduces the probability of success

combinatoricsprobability

E. g., there are $10$ red balls in a bag of $100$ balls, and each trial involves picking a ball out of the bag. If I get a red one, I keep it. If I do not get a red one, I place it back into the bag. I want to be able to calculate the probability that in $n$ trials I will pick $x$ red balls.

What if I keep the ball regardless of a success, and do not place any balls back into the bag. How would that change the method of calculating the probability?

Best Answer

Your second question is much easier. Number the balls $1$ to $100$, so balls $1$ to $10$ are red. If you never replace any balls, then your probability space consists of all sequences of $n$ numbers, each between $1$ and $100$, with no repeats. The number of such sequences is $100\cdot 99\cdots (100-n+1)=\frac{100!}{(100-n)!}$. A successful sequences consists of $x$ red balls and $n-x$ other balls in some order. The number of successful sequences is $\binom{10}x\cdot \binom{90}{n-x}\cdot n!$ (choose $x$ red balls, choose $n-x$ non-red balls, then order them). Therefore, the probability of success is $$ \frac{\binom{10}x\cdot \binom{90}{n-x}\cdot n!}{\frac{100!}{(100-n)!}}=\frac{\binom{10}x\cdot \binom{90}{n-x}}{\binom{100}{10}} $$ This is the hypergeometric distribution.


When you have "partial replacement," so red balls are kept and non-reds are returned, then there is no simple formula. Imagine that instead of stopping after $n$ draws, you continue until all red balls are drawn. Let $T_1$ be number of draws to get your first red ball, let $T_2$ be the number of draws it takes to get your second, and so on up to $T_{10}$. Then $T_k$ is a geometric random variable for each $k$, with probability of success $(10-(k-1))/(100-(k-1))$. That is, $$ P(T_k=m) = (1-p_k)^{m-1}p_k,\qquad \text{where }p_k=\frac{11-k}{101-k} $$ You want to find the probability that after $n$ draws, you have exactly $x$ red balls. In order for this to occur, you need to have drawn your $x^{th}$ red ball before drawn number $n$, which means that $T_1+\dots+T_x\le n$. However, you also need to not have drawn any more red balls before draw $n$, which is equivalent to saying $T_1+\dots +T_x+T_{x+1}> n$. In order words, we want to compute $$ P(T_1+\dots+T_x\le n)-P(T_1+\dots+T_x+T_{x+1}\le n) $$ A good tool for computing independent sums of discrete random variables is probability generating functions. The probability generating function for a geometric distribution $Z$ with probability of success $p$ is $$ G_{Z}(s):=\sum_{i\ge 0}P(Z=i)s^i=\frac{sp}{1-(1-p)s} $$ Furthermore, the p.g.f. for the sum of random variables is the product of their p.g.f's. Finally, we can recover the cumulative density function from a random variable $Z$ by extracting the coefficient of $x^i$ in $\frac{G_Z(s)}{1-s}$. That is, $$ P(Z\le i)=\text{coefficient of $s^i$ in } \frac{G_Z(s)}{1-s} $$ Putting this altogether, we get

\begin{align} P(\text{$x$ red balls in $n$ draws}) = \text{coefficient of $s^n$ in }\frac1{1-s}\left(\prod_{k=1}^x\frac{p_ks}{1-(1-p_k)s}\right)\left(1-\frac{p_{x+1}s}{1-(1-p_{x+1})s}\right) =\text{coefficient of $s^n$ in }\frac1{1-(1-p_{x+1})s}\left(\prod_{k=1}^{x}\frac{p_ks}{1-(1-p_k)s}\right) \end{align} This is difficult to evaluate by hand, but can be done easily with a computer if $x$ and $n$ are small enough. The following Mathematica code does this:

p[k_] := (10-(k-1))/(100-(k-1));
G[k_] := p[k]s/(1-(1-p[k])s);
Prob[n_,x_] := SeriesCoefficient[Product[G[k],{k,1,x}]/(1-(1-p[x+1])s),{s,0,n}];