Drawing n balls randomly, *without* replacement from a bag containing b black balls and w white balls —Part2

combinationscombinatoricspermutations

A bag has $b$ black balls and $w$ white balls. balls of the same color are indistinguishable.
Assume that $n \leq b$,$n \leq w$

  1. $z_k=$ the number of ways in which $n$ balls can be drawn randomly, without replacement from this bag containing $b$ black balls and $w$ white balls, when $k$ of the $n$ balls drawn are white

  2. $x=$ the number of ways in which $n$ balls can be drawn randomly, without replacement from this bag containing $b$ black balls and $w$ white balls

$$=>z_k = \frac{n!}{k!(n-k)!} \space \space \space and \space \space \space \space x = \sum_{k=0}^nz_k = \sum_{k=0}^n \frac{n!}{k!(n-k)!} = \sum_{k=0}^n \binom{n}{k} = 2^n$$

this is derived here

now consider the question:

A bag has $b$ black balls and $w$ white balls. balls of the same color are indistinguishable.
Assume that $n<b,n<w$
$n$ balls are drawn randomly, without replacement from this bag of $b$ black balls and $w$ white balls.
then the probability that $k$ of the $n$ balls are white is
$$\frac{\binom{w}{k}\binom{b}{n-k}}{\binom{w+b}{n}} = \frac{z_k}{x}$$
because considering balls of same colour to be distinguishable will not change this probability.
But, according to calculations done for $z_k$ and $x$, we have
$$\frac{z_k}{x} = \frac{n!}{k!(n-k)!2^n}$$
contradiction.
so, I know I have made a mistake in calculating the values for $x$ and/or $z_k$. what did I do wrong??

edit1:- say a bag has $b$ black balls and $w$ white balls. you sample n balls from the bag without replacement. Assume that $n \leq b,n \leq w$.
let $X$ is a r.v. and $X=k$ refers to the event that $k$ of the $n$ balls drawn are white.

  1. Then, X follows the hypergeometric distribution. right?
  2. Also, since it isn't specified if the balls of the same color are distinguishable or not, we can assume the balls of the same color to be either distinguishable or indistinguishable. right?
  3. assuming that the balls of the same color are distinguishable,
    $$P(X=k) = \frac{\text{no. of ways of getting $k$ white and $n-k$ black balls}}{\text{no. of ways of getting n balls}} = \frac{\binom{w}{k}\binom{b}{n-k}}{\binom{w+b}{n}}$$
  4. assuming that the balls of the same color are indistinguishable,
    $$P(X=k) = \frac{\text{no. of ways of getting $k$ white and $n-k$ black balls}}{\text{no. of ways of getting n balls}} =\frac{z_k}{x}= \frac{n!}{k!(n-k)!2^n}$$
  5. @David K, you are saying that the $\frac{z_k}{x}$ part is wrong. but then what is the correct "no. of ways of getting $k$ white and $n-k$ black balls" and the correct "no. of ways of getting n balls", when balls of the same color are indistinguishable?

Best Answer

Your previous question asked about combinatorics, not probability. You found the number of distinguishable outcomes (assuming balls of the same color are indistinguishable but different sequences of black and white are distinguished).

At no point did you ask whether these outcomes were equally likely.

When you count the outcomes under the assumption that the balls are all distinguishable, you get a set of outcomes each of which is equally likely.

When you make the same-color balls indistinguishable again, you reduce the number of outcomes by combining some outcomes together. But some of the "indistinguishable" outcomes contain more of the "distinguishable" outcomes than others. Hence you get a non-uniform distribution over outcomes.

If there is some hidden mechanism in the bag that causes each draw to be black or white with equal probability as long as there are balls of each kind remaining in the bag, then your "indistinguishable" outcomes become equally likely and the "distinguishable" outcomes not equally likely. But usually we assume that the probabilities of black and white are proportional to the numbers of black and white balls remaining.


I think a lot of the continued confusion is that you are trying to take the answers to combinatorics questions and plug them directly into the numerator and denominator of a probability. This works only in very specific, limited cases.

You looked at the case where the balls are indistinguishable but the sequence of draws matters; that is, it makes a difference if we swap a white ball with a black one in the result ($BBBW$ is a different outcome from $BWBB$) but it doesn't make any difference if we swap two black balls. And indeed then we have:

  • $\binom nk$ different ways to draw $k$ white balls and $n-k$ black balls;
  • $2^n$ different ways to draw $n$ balls.

Where things go wrong is when you present the following "equation":

$$P(X=k) \stackrel?= \frac{\text{no. of ways of getting $k$ white and $n-k$ black balls}}{\text{no. of ways of getting n balls}}$$

If we are counting numbers of ways to draw indistinguishable balls from a bag, the two sides of this "equation" are not equal in general.

Let's consider a concrete example: $n=2,$ $b = 3,$ $w = 997,$ $k = 0.$ Then $P(X=k)$ is the probability that we draw two black balls and no white ones, even though $997$ of the $1000$ balls in the bag were white.

To get $X=0$ we have to draw a black ball on the first draw; and then, when there are only $2$ black balls left in the bag, we have to draw another. Writing $B_1$ for the event that the first ball is black, $B_2$ that the second ball is black, the probability is

$$ P(X=0) = P(B_1 \cap B_2) = P(B_1) P(B_2\mid B_1) = \frac3{1000} \times \frac2{999} = \frac1{166500}. $$

Notice that there are $4$ ways to draw two balls ($BB,$ $BW,$ $WB,$ $WW$) and only one way to draw zero white balls ($BB$), but the probability of zero white balls is not $\frac14.$

If you add another million white balls to the bag at the start of the exercise, you will get an even smaller probability of zero white balls among the two balls you draw.

Also note that you get $2^n$ possible outcomes only if you count different sequences of balls as distinct. What if you not only cannot tell the balls apart, you also cannot say which one was drawn "before" another? Then you only have $n+1$ possible outcomes, and for any given $k$ you have only one outcome with $k$ white balls.


In summary, in Edit1, parts 1, 2, and 3 are correct. In part 4 the first equality sign is wrong; otherwise that part is correct. In part 5, you have already shown the correct numbers of ways to draw indistinguishable balls from a bag (when the sequence of drawing matters). The mistake is the idea that these numbers tell you anything about a probability.

The question you should ask is, "How do I correctly compute the probability when the balls are indistinguishable?"

One way is to compute it using the sequence of draws and conditional probability as I did above, where the probability of white on the $m$th ball depends on what you drew previously. But this way you have to consider the fact that each of the $\binom nk$ ways to get a sequence of $k$ indistinguishable white balls and $n-k$ indistinguishable black balls has a different set of conditional probabilities to multiply. For example, with $n=2,$ $b = 3,$ $w = 997,$ $k = 1,$

\begin{align} P(X=1) &= P((B_1 \cap W_2)\cup(W_1 \cap B_2)) \\ &= P(B_1) P(W_2\mid B_1) + P(W_1) P(B_2\mid W_1) \\ &= \frac3{1000}\times \frac{997}{999} + \frac{997}{1000}\times \frac3{999}\\ &= \frac{997}{333000} + \frac{997}{333000} \\ &= \frac{997}{166500}. \end{align}

Now you might notice that when you multiply the conditional probabilities in each case, although the individual probabilities you multiply are all different, the product is always the same. That's because we always have the same denominators and the same numerators, though they may occur in different sequences. Another way to see this is to observe that

$$ P(W_1 \cap B_2) = P(B_2) P( W_1\mid B_2) = P(B_1) P(W_2\mid B_1) = P(B_1 \cap W_2). $$

But whatever way you figure it out, if you realize that each sequence with $k$ white balls has probability

$$ \frac{w(w-1)\cdots(w-k+1) \times b(b-1)\cdots (b - n+k+1)} {(w+b)(w+b-1)\cdots(w+b-n+1)} = \frac{\binom wk k! \times \binom b{n-k} (n-k)!}{\binom{w+b}{n} n!} $$

and that there are $\binom nk$ different sequences, when you add together the probabilities of all sequences with $k$ white balls you get $$ \frac{\binom wk \binom b{n-k}}{\binom{w+b}{n}}.$$


A way I think of this intuitively is that we are modeling a world in which writing a number on a ball or erasing the number does not cause that ball to magically run away from you when you reach in the back nor jump into your hand. In fact the distinguishing marks (or lack thereof) on the white balls have no effect on the probability of drawing a white ball each time, and likewise with the black balls. So a correct way to compute $P(X=k)$ with indistinguishable balls is to compute $P(X=k)$ with distinguishable balls and simply copy the final result. This yields the same formulas shown in the previous few paragraphs.

The calculation is even simpler if you realize that it has no effect on $P(X=k)$ if you choose the balls and (before looking at any of them) mix the chosen balls together so you cannot tell which was drawn first. That is, the sequence of drawing also does not matter. Then you can derive the hypergeometric distribution almost immediately.