Conditional probability that first ball selected is blue

probability

An urn contains $r$ red ball and $b$ blue balls. $n$ balls are drawn
sequentially without replacement. If $k$ of the n balls are blue, what
is the conditional probability that the first ball chosen is blue??

try.

let $A$ be even that first ball is blue and $B$ be even that k of the n balls are blue. We want $P(A \mid B)$

Since $k$ of the $n$ balls are blue, then we can just work on the reduced sample space. But here is my confusion. so, out of the chosen balls, $k$ are blue and $n-k$ are red. so this is my smaller sample space. So, for the first ball to be blue dont we just do $\frac{k}{n}$?

But, in my lecture notes, professor write

$$ P(A \cap B) = \frac{ b {r \choose n-k } {b -1 \choose k -1 } (n-1)!}{{r+b \choose n} n! } $$

and $$P(B) = \frac{ {r \choose n – k } {b \choose k} }{{r + b \choose n} } $$

why this much work? Or perhaps my reasoning is flaw?

Best Answer

When you draw $n$ balls sequentially, you generate a sequence of colored balls. For example, if $r=3,$ $b=4,$ and $n=5,$ possible outcomes include $bbbrr,$ $brrbb,$ $bbbbr,$ $brrrb,$ and others. (That's assuming we do not identify each blue or red ball individually; with individual identifications, $b_1b_2b_3r_1r_2$ and $b_4b_2b_3r_1r_2$ would be distinct outcomes.)

Your reduced sample space also consists of sequences of colored balls, except that in the reduced sample space you only have sequences in which $k$ of the balls are blue. So if we put $k=3$ in the previous example, then the sequences $bbbbr$ and $brrrb$ are eliminated, but $bbbrr,$ $brrbb,$ and all other sequences with exactly three $b$s are still in the reduced sample space.

You can argue that in the reduced sample space, all the outcomes are equally likely. (This is not true for the original sample space.) You can also argue by symmetry that no ball is more likely to be blue than another, and since the expected number of blue balls is $k,$ by linearity of expected value of the number of blue balls in each position in the sequence is $\frac kn.$ Since the number of blue balls in a given position is either $0$ or $1$, the probability that the ball is blue is $\frac kn.$

Another argument using the reduced sample space would be, we can select an unordered set of $k$ blue balls and $n-k$ red balls from the original $r + b$ balls, and then put them in sequence; this produces the same result (in probability) as selecting the balls in sequence without replacement. Then given we have selected $k$ blue balls and $n-k$ red balls, the probability that the first ball will be blue when they are put in sequence is $\frac kn.$ I think this may be the same as your idea.

Your professor's formulas work out the probabilities explicitly in a sample space that not only says what color each ball is, but also identifies each ball individually. When you divide $P(A\cap B)$ by $P(B),$ a lot of cancellation happens very quickly, and you can cancel everything else but the final $\frac kn$ without too much trouble; but I think reasoning from the reduced sample space gives the answer with better intuition and less computation.