[Math] Expected number of Failures within K trials for Binomial RV

binomial distributionconditional probabilityexpected valueprobabilityrandom variables

Question:

Given a binomial random variable with probability of success p and probability of failure (1-p). What is the expected number of failures given that a success was observed within k trials and then trials were halted?

For example, if k=3 then the observations that qualify would be:

Failure, Failure, Success

Failure, Success

Success

Note that the following observations do NOT qualify even though the success occurs within the first k trials because the trials not halted after the first success:

Failure, Success, Failure

Success, Failure, Failure

Failure, Success, Success

ETC…

And of course these observations do NOT qualify because there are no successes in them at all:

Failure, Failure, Failure

Failure, Failure

My solution:

In the example for k=3, I think we can calculate the expected number of failures for k=3 in the following way. First, calculate the total probability of observations that qualify:

$$Total Conditional Probability = TCP = P(Failure, Failure, Success) + P(Failure, Success) + P(Success) = p(1-p)^2 + p(1-p) + p$$

Then, we can answer the question by calculating the expected number of failures for the all observations that qualify and normalize it by our probability space (TCP):

$$(P(Failure, Failure, Success)*NumFailures(Failure, Failure, Success) + P(Failure, Success)*NumFailures(Failure, Success) + P(Success)*NumFailures(Success))/TCP$$

$$(P(Failure, Failure, Success)*2 + P(Failure, Success)*1 + P(Success)*0)/TCP$$

$$\frac{(p(1-p)^22 + p(1-p)1 + p0)}{TCP}$$

We then repeat this procedure using a general k. The formula for TCP is:

$$TCP = \sum_{i=0}^{k-1}{(p(1-p)^i)}$$

And the expected number of failures for general k:

$$\frac{\sum_{i=0}^{k-1}{((p(1-p)^i)i)}}{TCP}$$

I would also like know if there is a way to simplify this formula, assuming it is correct. Thank you.

Best Answer

A binomial random variable counts the number of "successes" among a fixed number of trials $n$, without regard to the order in which the outcomes of those trials occur.

Since you stop observing trials after the first success is observed, a more appropriate probability model is a geometric random variable, which in your case would count the number of failures before the first success is observed. This is given by $$\Pr[Y = y] = p (1-p)^y, \quad y \in \{0, 1, 2, 3, \ldots\}.$$ Your question then amounts to $$\operatorname{E}[Y \mid Y \le k-1];$$ this is the conditional expectation of the number of failures given that there are at most $k-1$ such failures (which implies that the first success occurs by the $k^{\rm th}$ trial). To calculate this, note $$\Pr[Y \le k-1] = \sum_{y=0}^{k-1} \Pr[Y = y] = \sum_{y=0}^{k-1} p(1-p)^y = 1 - (1-p)^k.$$ Thus $$\operatorname{E}[Y \mid Y \le k-1]\Pr[Y \le k-1] = \sum_{y=0}^{k-1} \operatorname{E}[Y \mid Y = y]\Pr[Y = y] = \sum_{y=0}^{k-1} y p(1-p)^y = \frac{1-p - (1-p)^k(1-p+kp)}{p}.$$ Therefore, $$\operatorname{E}[Y \mid Y \le k-1] = \frac{1}{p} + k - 1 - \frac{k}{1-(1-p)^k}.$$ So for $k = 3$ and $p = 1/5$, we have $52/61$.