Drawing balls from an urn

balls-in-binsprobabilityprobability distributionsstatistics

I'm preparing for my midterm and I'm having trouble connecting all the dots in the following problem:

From an urn containing 2 yellow balls, 3 blue balls and 22 red balls, one
ball is drawn, found to be red and discarded. Then 5 balls are drawn and
thrown away, their colors un-noticed. Given this information, what is the
probability that if you now draw four balls from the urn, they are all red?

From this logic I have deduced that we have 6 cases after drawing the 5 balls: 0 red balls drawn, 1 red ball drawn, 2 red balls drawn, 3 red balls drawn, 4 and then 5.

I worked by summing the probabilities of all 6 cases and the answer was way off the actual solution (it should be 0.400).

(21P4 + 20P4 + 19P4 + 18P4 + 17P4 + 16P4)/(21P4))

Please excuse my bad notation!

Best Answer

After the first ball is drawn, seen to be red, and discarded, there will be $21$ red balls remaining and $5$ non-red balls remaining (what the exact colors of each non-red ball is doesn't matter).

Next, since we draw some balls, and then discard them without looking at them, this is effectively the same as though we never drew them at all in the first place as it comes to calculating probabilities of the color of the next few draws. As such, it is much easier to completely ignore that this second step occurred at all.

Continuing then, we effectively have $26$ balls remaining, $5$ of which are non-red and the remaining $21$ of which are red. We ask what the probability that when selecting four of these we get only reds. This follows the hypergeometric distribution. You can use permutations if you like, but I find it easier to use binomial coefficients. The answer being:

$$\frac{\binom{21}{4}}{\binom{26}{4}}$$

Your approach wasn't a bad one, but you forgot something crucial in your calculations, and that is to have "conditioned" each term based on the probability of entering that specific case.

In more math heavy description, you have $Pr(A) = Pr(A\mid B_1)Pr(B_1)+Pr(A\mid B_2)Pr(B_2)+\dots+Pr(A\mid B_n)Pr(B_n)$ where $B_1,B_2,\dots,B_n$ form a partition of the sample space. You calculated each of $Pr(A\mid B_1),Pr(A\mid B_2),\dots$ and used these in your attempt but completely ignored each of $Pr(B_1),Pr(B_2),\dots$.

Correcting your approach then, we actually have something more like:

$$\frac{\binom{21}{0}\binom{5}{5}}{\binom{26}{5}}\times\frac{\binom{21}{4}}{\binom{21}{4}} + \frac{\binom{21}{1}\binom{5}{4}}{\binom{26}{5}}\times\frac{\binom{20}{4}}{\binom{21}{4}} + \frac{\binom{21}{2}\binom{5}{3}}{\binom{26}{5}}\times\frac{\binom{19}{4}}{\binom{21}{4}} + \dots + \frac{\binom{21}{5}\binom{5}{0}}{\binom{26}{5}}\times\frac{\binom{16}{4}}{\binom{21}{4}}$$

If you go through the effort of calculating each answer, you will find that they are the same. The second approach however, you will find much more tedious and so is unadvised.

Related Question