[Math] A urn contains blue balls and red balls. I need to find probabiltiy of drawing more blue balls than red balls

probability

A urn contains 5 identical blue balls and 4 identical red balls. Taking 5 balls at random from the urn what is the probability that the number of blue balls be greater than the number of red balls?

My first guess was setting the ways that I can draw the balls.

It was:

$\color{blue} {BBBBB}$ ; $\color{blue} {BBBB}\color{red}{R}$; $\color{blue} {BBB}\color{red}{RR}$; $\color{blue} {BB}\color{red}{RRR}$; $\color{blue} {B}\color{red}{RRRR}$

Only $3$ cases have the number of blue balls greater than the number of red balls. Then the odds must be $\displaystyle{\frac{3}{5}}$.

But this answer sounds strange for me. I think that it is wrong?

Could anyone help me how to figure out this question?

Best Answer

Three approaches:

(1) This can be viewed as a hypergeometric distribution. The urn contiains four red balls and five blue balls. Let $X$ be the number of red balls among five balls drawn at random without replacement. To draw more blue balls then red you need to evaluate $P(X \le 2).$ In R statistical software this can be evaluated as follows:

phyper(2, 4, 5, 5)
## 0.6428571

(2) The equivalent answer can be obtained using a combinatorial argument:

$$\frac{{4 \choose 0}{5 \choose 5}+{4\choose 1}{5 \choose 4}+{4 \choose 2}{5 \choose 3}}{{9 \choose 5}} = \frac{1 + 20 + 60}{126} = 81/126 = 9/14 = 0.6428571$$

(3) An approximate value (to about 3 places) from simulating a million draws of five balls from such an urn and counting the red balls can be obtained as follows:

set.seed(616)
m = 10^6;  urn = c(1,1,1,1,1,2,2,2,2)    # 1 = blue, 2 = red
r = replicate(m, sum(sample(urn, 5)==2)) # sample 5 balls without replacement
mean(r <= 2)                             # mean of logical vector is nr of TRUEs
## 0.642822

The histogram below shows the simulated hypergeometric distribution of the number of red balls drawn. The open red dots show exact hypergeometric probabilities. At the scale of the graph, it is not easy to see any difference between the simulated and exact values.

enter image description here

Note: You should be sure you understand and can explain the details of either method (1) or method (2) for your class. The simulation is probably not something you are expected to know.

Related Question