Probability – Solving a Weird Probability Question

combinatoricsprobabilitysolution-verification

This is the problem in question:

You have two identical bowls: the first one contains 3 white balls and 4 black balls, and the second one contains 4 white balls and 5 black balls. If you choose randomly a ball from the two bowls, what is the probability it is white?

Let's define our events as such:

A1 = choosing a ball from the first bowl

A2 = choosing a ball from the second bowl

B = choosing a white ball

One approach would be using the theorem of total probability:

$$\text{We know that }P(B|A_1) = \frac34\text{ and }P(B|A_2) = \frac45\text{, and that:}$$

$$P(A_1) = P(A_2) = \frac12\text{, because the bowls are identical}$$

$$P(B) = P(B |A_1)\times P(A_1) + P(B|A_2)\times P(A_2) = \frac37 \times \frac12 + \frac49 \times \frac12 = 55/126$$

The second approach would be simplifying the problem:

Because the two bowls are identical, we could just say we don't even choose between two bowls, but just between the set of all balls.

Then we could calculate the probability directly:

$$P(B) = \frac {\text{number of white balls}} {\text{total number of balls}} = \frac7 {16}$$

Now, which one is correct? (and why?)

Both solutions seem reasonable, and they have approximately the same value

$$\frac{55}{126}\approx0.4365$$

$$\frac{7}{16}\approx0.4375$$

However, mathematically speaking, they are different results. Which one is correct?

Best Answer

The difference in probabilities arises because the two sampling methods are different.

To understand why, consider an extreme case where one bowl has a single white ball, and the second bowl has $100$ black and $99$ white balls.

If you use the first sampling method--i.e., choose one of the two bowls equally at random, then choose a ball uniformly at random from that bowl, then it's immediately obvious that you have at least a $1/2$ probability of drawing a white ball, because if you choose the bowl with the single white ball, you are guaranteed to draw a white ball. And if you pick the second bowl, you have a $99/199 \approx 1/2$ chance of drawing a white ball, so the total probability is roughly $3/4$.

But if you use the second sampling method, there are equal numbers of white and black balls and the probability of drawing a white ball is exactly $1/2$.

So what is going on here is that the hierarchical sampling approach (choose a bowl, then choose a ball in that bowl) does not give you the same chance of drawing any particular ball, so long as the number of balls in each bowl are different. Such a sampling method, applied to the extreme example, would pick the singular white ball half of the time, whereas in the second sampling method, it would be chosen on average only $1/200$ of the time.


So the question of "which probability is correct" depends on what kind of sampling procedure you use. This is why, in the analysis of experimental data, it is important to consider how sampling takes place.

Related Question