[Math] Box with N two-color balls, randomly chosen, with and without returning

probability

Suppose there is a box with $N$ balls, $k$ white and $N-k$ black.

  1. If after choosing a ball it gets returned to the box, then: $$p(\text{white})=\frac{k}{N},\space\space\space p(\text{black}) = \frac{N-k}{N} = 1-p(\text{white})$$ — probabilities of choosing one white (black) ball, at each moment, regardless of how many balls will be selected.
  2. What if after choosing a ball, it will be kept outside the box, and the process of selecting balls will be continued, untill all balls are selected? I am aware that this is basic, course question. I am hoping for an answer or reference, to get a level of certainty.

  3. A third method of selecting balls is possible: suppose there is a simple device – a metal "matrix" with $N$ holes, in the shape of the box. The matrix is put on top of the box, which is then turned upside down. Also, we somehow ensure, that each hole will pass through exactly one ball. Then:

    • Process is similar to case 2., because balls are not returned to box after choosing.
    • Probability that given cell in the matrix will pass through white (black) ball is the same as in case 1., the basic case.
    • This means, that it is not the act of not returning ball that influences probabilities in case 2. – it is the act of receiving information about ball chosen, and thus missing (in case 2.) in the box.
    • Without recording the information, probabilities of case 2. are the same as in case 1.?

Best Answer

Suppose that we do not return balls to the box. The plain probability that the $i$-th ball selected is white is $\frac{k}{N}$, exactly like in the case of returning the ball to the box.

One way of seeing this is to number the balls, white from $1$ to $k$, black from $k+1$ to $N$. Imagine that we draw the balls, one at a time, until all the balls are gone. All permutations of the labels are equally likely, and the fraction of these permutations for which the $i$-th ball drawn is white is $\dfrac{k}{N}$. (It can take a while until this fact becomes "obvious"!)

But in drawing one at a time, there are other probabilities that we can compute, for example the probability that the $3$rd ball drawn is white given that the first two were white. This conditional probability is not $\dfrac{k}{N}$, it is $\dfrac{k-2}{N-2}$ (except in trivial cases, like $k=1$).

Suppose again that we draw the balls, one at a time, until they are all gone. The conditional probability that the $3$rd ball drawn is white, given that the last two balls drawn (of the $N$) are white is also $\dfrac{k-2}{N-2}$. So it is not the temporal order of the drawing that matters in evaluating the conditional probability.

When we calculate a conditional probability, it is not the act of receiving information that matters, it is the act of using the information to calculate a conditional probability, that is, to restrict the sample space.

When one does replacement, conditional and unconditional probabilities are the same, since the sample space is unchanged.