Probability – Conditional Probability of Finding a Defective Item Amongst $k\times m$ Items

bayes-theoremconditional probabilityprobability

There are $k$ packages, each with $m$ items. One of the $k \cdot m$ items is a defect. To find the defect, $n$ items are randomly selected from each package. I wish to determine the probabilities that (a) the defect is in the first package ($B_1$), given that it is not found in the first package ($F_1^c$), and (b) the defect is in the second package ($B_2$), given that it is not found in the first package ($F_1^c$).

To do so, I have determined the following:

(a) Intuitively, $P(B_1) = 1/k$ (because each box has an equal probability of containing the defect), $P(F_1|B_1) = n/m$ (because we're sampling $n$ out of $m$ items) and $P(F_1) = n/(mk)$ (from the law of total probability).
Thus, Bayes' Rule leads to
\begin{align} P(B_1|F_1^c) = \frac{P(F_1^c|B_1)P(B_1)}{P(F_1^c)} = \frac{(1-n/m)(1/k)}{1-n/mk}. \end{align}
(b) Intuitively, $P(B_2) = 1/k$ and $P(F_1^c|B_2) = 1$, because, if the object is located in package 2, then it will not be found in package 1. Again, Bayes' Rule leads to,
\begin{align} P(B_2|F_1^c) = \frac{P(F_2^c|B_1)P(B_2)}{P(F_1^c)} = \frac{1/k}{1-n/mk}, \end{align}
both of which final expressions can be simplified a little. Is this reasoning to determine the conditional probabilities correct?

Best Answer

$$ P(B_1\mid F^c_1) = \frac{P(B_1, F^c_1)}{P(F^c_1)} =\frac{P(F^c_1\mid B_1)P(B_1)}{P(F^c_1)}. $$ If the defect is certainly found in $B_1$, the probability is proportional to the fraction inspected, or: the number of objects found are $Hypergeo(m, 1, n)$, $$ P(F_1\mid B_1) = n/m. $$ Further, $$ P(F_1) = P(F_1\mid B_1)P(B_1) + P(F_1\mid B^c_1)P(B_1) = \frac{n}{m}\frac{1}{k} + 0 = \frac{n}{mk} $$ so, since $P(B_j) = 1/k$, $$ P(B_1\mid F^c_1) = \frac{(1 - n/m)/k}{1 - n/(mk)} = \frac{m-n}{km -n} $$ i.e. "remaining objects in 1" / "total objects remaining".

The second part is also the same: $$ P(B_2\mid F^c_1) = \frac{P(B_2, F^c_1)}{P(F^c_1)} =\frac{P(F^c_1\mid B_2)}{P(F^c_1)}. $$ $P(F^c_1\mid B_2) = 1 - 0$ so $$ P(B_2\mid F^c_1) = \frac{1/k}{1 - n/(mk)} = \frac{m}{km -n} $$ or, "objects in 2"/"total objects remaining".

In conclusion, these answers agree with the original question and with intuition.

Another, perhaps simplifying, formulation is to set two vectors: observed $X$ and unobserved $Y$. There are $m$ copies of each and exactly one takes the value 1 and all others are zero. Define $M=mk$ and set $n_y = m-n$ so that $P(Y_k=1) = n_y/M$ and $P(X_k=1) = n/M.$

This formulation separates the boxes from each other so $P(B_1\mid F^c_1) = P(Y_1 = 1\mid X_1 = 0) = n_y/(M-n)$. This can be visualized as the area occupied by a single $Y$ divided by the area that the 1 could be in. Finally, $$ P(B_2\mid F^c_1) = P(X_2 + Y_2 = 1\mid X_1 = 0) = (n_y+n)/(M-n) = m/(km-n). $$