Probability of a certain outcome when drawing balls from an urn with 2 colors without replacement

probabilityprobability distributions

Consider an urn containing $i$ red and $j$ blue balls. We draw $n$ balls from the urn without replacement. If $(x_1,\dots,x_n)$ denotes an outcome, how should one canonically define the probability of this outcome?

I think the probability depends on the order on which red (or blue) balls are drawn. In other words, $RBBBBBBB$ will have a different probability from $BBBBBBBR$ since it seems intuitively obvious that the chances of getting a red in the beginning would be less and in the end should be more (since later there are lesser blue balls).

I looked at hypergeometric distribution but that is not what I really want.

Best Answer

For this post, I will use $n\frac{r}{~}$ to denote the falling factorial: $$n\cdot(n-1)\cdot(n-2)\cdots(n-r+1)=\frac{n!}{(n-r)!}$$

I will make a few changes to notation. It should be clear why I did so in a moment. Let us instead talk about the total number of red balls as $I$ and the total number of blue balls as $J$. Further, let us talk about the total number of balls as $I+J = N$. We will then be wanting to draw $n$ balls total, and let $i$ instead represent the total number of red balls that happened to have been drawn (rather than the total number available) and similarly $j$ be the number of blue balls that were drawn.

Imagine that the balls are all uniquely labeled. Recognize then that each of the $N\frac{n}{~}$ ways of selecting $n$ balls in sequence from the $N$ available balls are equally likely to have occurred.

Let us consider a specific sequence of colors of balls that contains $i$ red balls and $j$ blue balls. Let us count how many ordered sequences of labeled balls result in this sequence of colors.

From left to right, decide which specific red ball occupies a space intended for a red ball to go. The first time there will be $I$ options for the specific red ball, then $I-1$ for the next, and so on... resulting in $I\frac{i}{~}$ ways in which we may select which red ball happened to go in which spot.

Similarly, from left to right, decide which specific blue ball occupied a space intended for a blue ball to go. As before, this will result in $J\frac{j}{~}$ ways in which this can be done.

We get then a probability of:

$$\dfrac{I\frac{i}{~}\cdot J\frac{j}{~}}{N\frac{n}{~}}$$

Notice, this does not change based on the order in which the colors of the balls occurred. It is exactly as probable to have gotten a sequence RBBBBB as it is to have gotten a sequence BBBBBR. While yes the probability that the first ball is red is less than the probability that the $n$'th ball is red given that the first n-1 balls were blue, that is irrelevant and what we should have been asking is what the probability is that the first ball was red compared to the probability the $n$'th ball was red where this second probability is not conditioned on anything. This is similar to how it is equally likely to have drawn a queen on the first draw of a deck as it is to have drawn a queen on the second draw from a deck or indeed the $n$'th draw from a deck for any $n$.

From the above observations and derived formula, we can then further derive the formula for the hypergeometric distribution by accounting for all of the $\binom{n}{i}$ orders in which we could have seen red vs blue balls.