Find the binomial probability mass function of a binomial random variable

binomial distributionprobabilityprobability theoryrandom variables

I have a case as the follows:

An item is sampled in batches independently. The items are identical and some of them are diseased with probability $p$. The batches that are selected randomly are of size $n$. If there is a threshold $d < n$ above which we reject the whole batch (of size $n$).

Now, this means that the number of diseased in a batch can be modeled as a $X \sim Bin(n,p)$. Now, the probability of rejecting the whole batch is $P(X \ge d)$, which is calculated as

$$ P(X \ge d) = \sum_{i=d}^n {}_nC_k p^k (1-p)^{n-k} $$

Now, on top of this, assume that there is a random draw process of selecting batches. Now, the drawing of random batches is done, say, $N$ times. The probability of a batch getting thrown out was given above as $P(X \ge d)$. Let's define this as $r = P(X \ge d)$. Then, the number of batches thrown out can be modeled as another binomial random variable $Y \sim Bin(N, r)$. Similarly, the probability of $m$ batches being thrown out is

$$ P(Y = m) = {}_NC_m r^m (1-r)^{N-m} $$

Now, since the batch process is a binomial random variable ($X \sim Bin(n, p)$) itself, and the drawing of a collection of batches (a batch of batches) is ($Y \sim Bin(N, r)$), is there a way to model this binomial random variable of binomial random variables as another binomial (or other discrete mixture) of binomial random variables? This is not a simple combination of binomial random variables, as a batch is thrown out when above a threshold $d$. If the combined random variable $Y$ was just the number of defects, then this would be a simple sum of each of the independent draws for the number $N$, but since the process $X$ consists of throwing out above a certain threshold, this is not the case.

If there are other random variables that can easily model this, I would be happy to read more about those types of random variable(s).

Best Answer

This is a hierarchical model:

$$X \sim \operatorname{Binomial}(n,p) \\ Y \mid X, d \sim \operatorname{Binomial}(N, r_d) \\ r_d = \Pr[X \ge d] = \sum_{x=d}^n \binom{n}{x} p^x (1-p)^{n-x}.$$

Then the unconditional or marginal distribution of $Y$ is

$$\Pr[Y = y \mid N, n, p, d] = \binom{N}{y} \left(\sum_{x=d}^n \binom{n}{x} p^x (1-p)^{n-x} \right)^y \left( \sum_{x=0}^{d-1} \binom{n}{x} p^x (1-p)^{n-x} \right)^{N-y}.$$ We can also write this as $$\Pr[Y = y \mid N, n, p, d] = \binom{N}{y} (1-p)^{Nn} \left(\sum_{x=d}^n \binom{n}{x} \left(\frac{p}{1-p}\right)^x \right)^y \left( \sum_{x=0}^{d-1} \binom{n}{x} \left(\frac{p}{1-p}\right)^x \right)^{N-y}.$$

However, not much else can be done to simplify this expression any further. It turns out that the unconditional distribution is still binomial, because the probability $r_d$ does not depend on a realization of $X$; rather, it is a function only of the model parameters $n, p, d$.

For example, say we take $N = 50$ batches, each of which has size $n = 7$ items, and the probability of observing a defective item is $p = 0.01$. If we require the observation of at least $d = 1$ defect in a batch to reject it, then the random number of rejected batches is binomial with parameters $N = 50$ and $$r_d = \sum_{x=1}^7 \binom{7}{x} (0.01)^x (0.99)^{7-x} = 1 - \binom{7}{0} (0.99)^7 = 0.0679347.$$ The probability of rejecting at least $2$ batches out of the $50$ would be $$\Pr[Y \ge 2] = 1 - \Pr[Y \le 1] = 1 - \binom{50}{0} r_d^0 (1-r_d)^{50} - \binom{50}{1} r_d^1 (1-r_d)^{49} \approx 0.862203.$$

Related Question