Expected Number of Trials for First Success when Probability Changes Every Trial

expected valueprobabilityprobability distributions

Let's say you have a box containing exactly 1 blue ball and 1 red ball.

At every trial, one ball is randomly picked from the box. After picking one, one blue ball is added to the box. (NOTE: After picking, the ball is returned back to the box)

What is the average/mean expected number of trial needed to pick a red ball? (first success?) Is it possible to calculate this? If so, calculate the Variance as well.

MY ATTEMPT:

If I'm not wrong, the formula for calculating the number of trials needed to flip the first head of a coin is 1/p (which I suspected was very similar to this). The thing that got me stuck is that the probability changes after every trial. Maybe some special kind of distribution is needed to calculate this or some limit theorem (idk?) but I'm very lost about that. Any pointers?

EDIT 1:
Please tell me if I made a mistake somewhere:
the probability of red ball first got picked at nth trial is the product of all probabilities of red ball not getting picked before the nth trial and probability of red ball getting picked at the nth trial so:

the probability that a red ball only got picked at 1st trial is 1/2

the probability that a red ball only got picked at 2nd trial is 1/2 * 1/3 = 1/6

the probability that a red ball only got picked at 3rd trial is 1/2 * 2/3 * 1/4 = 1/12

the probability that a red ball only got picked at 4th trial is 1/2 * 2/3 * 3/4 * 1/5 = 1/20

the probability that a red ball only got picked at 5th trial is 1/2 * 2/3 * 3/4 * 4/5 * 1/6 = 1/30

I'm seeing a pattern, but I'm still at lost on how to count the expected number of trials

EDIT 2: The ball that got picked is returned to the box afterwards. Sorry for not being clear.

EDIT 3 : Because I am after the number of trials for first success, the formula would be 1/p correct?

Therefore the expected trials needed at can be calculated as a function of n : f(n) = n(n+1) (where n is the trial number in which the red ball got picked)

However I'm still very stumped on how to calculate the mean expected trials with n still in the way…

Best Answer

Let $X$ be the random variable that counts the number of trials until the red ball is taken out of the box. Note that $X$ is an unbounded random variable with realizations in $[1, \infty)$.

As you observed, following @lulu's advice, we have: $$ \Pr(X = k) = \prod_{i=2}^k \left(1 - \frac1i\right) \cdot \frac{1}{k+1} = \frac{1}{k\cdot(k+1)} $$

where: $$ \prod_{i=2}^k \left(1 - \frac1i\right) = \prod_{i=2}^k \left(\frac{i-1}{i}\right) = \frac12\cdot\frac23\cdot\frac34\cdot \dots \cdot \frac{k-1}{k} = \frac1k $$

Now, the expected value is: $$ \mathbb{E}[X] = \sum_{k=1}^{\infty} k \cdot \Pr(X=k) = \sum_{k=1}^{\infty} \frac{k}{k\cdot(k+1)} = \sum_{k=1}^{\infty} \frac{1}{k+1} $$

The sum is the harmonic series $- 1$, which is divergent, so $\mathbb{E}[X] \to \infty$, and thus, $\mathrm{Var}[X]$ is also undefined.

Regarding your comment:

does that mean that the expected number of trials cannot be calculated in this case?

That is the final calculation, $X$ has infinite expectation.

It also makes sense intuitively, since if you don't pick up the red ball at some trial $k$, then you make it even harder (i.e. less probable, by adding a new blue ball) to do so at some later trial - you're increasing the probability of failure.

Related Solutions

Calculating the probability of $x$ number of successes in $n$ trials where each success reduces the probability of success

Your second question is much easier. Number the balls $1$ to $100$, so balls $1$ to $10$ are red. If you never replace any balls, then your probability space consists of all sequences of $n$ numbers, each between $1$ and $100$, with no repeats. The number of such sequences is $100\cdot 99\cdots (100-n+1)=\frac{100!}{(100-n)!}$. A successful sequences consists of $x$ red balls and $n-x$ other balls in some order. The number of successful sequences is $\binom{10}x\cdot \binom{90}{n-x}\cdot n!$ (choose $x$ red balls, choose $n-x$ non-red balls, then order them). Therefore, the probability of success is $$ \frac{\binom{10}x\cdot \binom{90}{n-x}\cdot n!}{\frac{100!}{(100-n)!}}=\frac{\binom{10}x\cdot \binom{90}{n-x}}{\binom{100}{10}} $$ This is the hypergeometric distribution.

When you have "partial replacement," so red balls are kept and non-reds are returned, then there is no simple formula. Imagine that instead of stopping after $n$ draws, you continue until all red balls are drawn. Let $T_1$ be number of draws to get your first red ball, let $T_2$ be the number of draws it takes to get your second, and so on up to $T_{10}$. Then $T_k$ is a geometric random variable for each $k$, with probability of success $(10-(k-1))/(100-(k-1))$. That is, $$ P(T_k=m) = (1-p_k)^{m-1}p_k,\qquad \text{where }p_k=\frac{11-k}{101-k} $$ You want to find the probability that after $n$ draws, you have exactly $x$ red balls. In order for this to occur, you need to have drawn your $x^{th}$ red ball before drawn number $n$, which means that $T_1+\dots+T_x\le n$. However, you also need to not have drawn any more red balls before draw $n$, which is equivalent to saying $T_1+\dots +T_x+T_{x+1}> n$. In order words, we want to compute $$ P(T_1+\dots+T_x\le n)-P(T_1+\dots+T_x+T_{x+1}\le n) $$ A good tool for computing independent sums of discrete random variables is probability generating functions. The probability generating function for a geometric distribution $Z$ with probability of success $p$ is $$ G_{Z}(s):=\sum_{i\ge 0}P(Z=i)s^i=\frac{sp}{1-(1-p)s} $$ Furthermore, the p.g.f. for the sum of random variables is the product of their p.g.f's. Finally, we can recover the cumulative density function from a random variable $Z$ by extracting the coefficient of $x^i$ in $\frac{G_Z(s)}{1-s}$. That is, $$ P(Z\le i)=\text{coefficient of $s^i$ in } \frac{G_Z(s)}{1-s} $$ Putting this altogether, we get

\begin{align} P(\text{$x$ red balls in $n$ draws}) = \text{coefficient of $s^n$ in }\frac1{1-s}\left(\prod_{k=1}^x\frac{p_ks}{1-(1-p_k)s}\right)\left(1-\frac{p_{x+1}s}{1-(1-p_{x+1})s}\right) =\text{coefficient of $s^n$ in }\frac1{1-(1-p_{x+1})s}\left(\prod_{k=1}^{x}\frac{p_ks}{1-(1-p_k)s}\right) \end{align} This is difficult to evaluate by hand, but can be done easily with a computer if $x$ and $n$ are small enough. The following Mathematica code does this:

p[k_] := (10-(k-1))/(100-(k-1));
G[k_] := p[k]s/(1-(1-p[k])s);
Prob[n_,x_] := SeriesCoefficient[Product[G[k],{k,1,x}]/(1-(1-p[x+1])s),{s,0,n}];

Expected Number of Trials to Achieve a Result With Dependent Probability

The expected number of attempts for $n$ successes is $n$ times the expected number of attempts for one success, and it is easier to look at the expected number of attempts for one success.

The probability of no success after $m$ attempts when $m\le t$ is $(1-p)^m$
The probability of no success after $m$ attempts when $t \lt m\le t+\frac{1-p}{q}$ is $(1-p)^t \prod\limits_{k=1}^{m-t}(1-p-kq)$
The probability of no success after $m$ attempts when $t+\frac{1-p}{q} \lt m$ is $0$

If you add these all up to give the expected number of attempts for one success and then multiply by $n$, then you get $$n \left(\frac{1-p^{t+1}}{1-p} + \left(1-p^{t}\right)\sum\limits_{m=t+1}^{t+\lfloor(1-p)/q\rfloor} \prod\limits_{k=1}^{m-t}(1-p-kq)\right)$$

This will not have a closed form, but in the example in the question with $n=3, p=q=0.02$ and $t=50$ it seems to give about $3\times 34.594555=103.783665$ from the earlier result

It may be possible to give approximations, at least for large $t$ and small $p$ and $q$ in special circumstances. For example if $c=\frac1p=\frac1q$ for some integer $c$ then I think you may be able to use, if I have done this correctly, an approximation like $$n\left(c-e^{-t/c}\left(c-\sqrt{\frac \pi 2}\sqrt{c}+\frac56-\frac7{12}\sqrt{\frac \pi 2}\sqrt{\frac{1}{c}} + \frac{617}{1080}\frac1c -\cdots\right) \right) $$

which with $n=3, t=50$ and $c=50$ gives about $103.7806$, which is not far away from the earlier result.

Best Answer

Related Solutions

Calculating the probability of $x$ number of successes in $n$ trials where each success reduces the probability of success

Expected Number of Trials to Achieve a Result With Dependent Probability

Related Question