N white and k black balls in m boxes probability of co-occurence

probability

n white and k black balls are randomly and independently distributed amongst m boxes. There is no limit to the number of balls a box can contain.

As a result, there are four possible states for each box:

  1. Empty
  2. black only
  3. white only
  4. black and white

What is the expected fraction of boxes in each state given n, m and k?

Obviously the expected distribution of states depend on the expected frequency of boxes containing at least one black or one white ball, but I do not even know how to calculate that given that there are no restrictions on the number of balls in a box.

I had calculated this whithout taking into account the possibility that multiple balls with the same colour would end up in the same box. In that case it would simply be:

  1. 1 – α – β + α∗β
  2. α − α∗β
  3. β − α∗β
  4. α∗β

with α = m/k and β = m/n. But this is obviously wrong if there is no limitation on the number of balls per box.

Best Answer

It is important to be clear about the process for distributing balls between boxes. Just clearing this up can help you think about how to work out the probabilities.

Let's assume the procedure for assigning balls to boxes "randomly and independently" is this:

  • Consider each of the $(n + k)$ balls, one at a time.
  • For each ball, select 1 of the $m$ boxes uniformly at random (regardless of its contents).
  • Place the ball in the box.

Case 1: What is the probability that a particular box is empty?

To be empty the box had to not be selected $(n + k)$ times in the procedure above. The probability of this box not being selected once is $$\frac{m-1}{m}$$ Think of this as the probability that this box got missed, or equivalently that a different box got selected. The probability of this box not being selected $(n+k)$ times in a row is $$\left(\frac{m-1}{m}\right)^{n+k}$$

Case 2: What is the probability that a particular box contains only black balls?

To have at least one black ball and no white balls we require both that all white balls missed the box, and that not all black balls missed the box. This gives the product below where the first factor is the probability that all white balls miss the box and the second factor is the probability that not all black balls miss the box.

$$\left(\frac{m-1}{m}\right)^{n}\left(1-\left(\frac{m-1}{m}\right)^{k}\right)$$

Case 3: What is the probability that a particular box contains only white balls?

The reasoning here is the same as in Case 2, but with "black" and "white" swapped. Not all white balls can miss this box, but all black balls need to miss the box.

$$\left(1-\left(\frac{m-1}{m}\right)^{n}\right)\left(\frac{m-1}{m}\right)^{k}$$

Case 4: What is the probability that a particular box contains black and white balls?

Here we need the probability that not all white balls can miss this box, and also that not all black balls miss the box. Then the box would contain at least 1 white and at least 1 black ball.

$$\left(1-\left(\frac{m-1}{m}\right)^{n}\right)\left(1-\left(\frac{m-1}{m}\right)^{k}\right)$$

Summary: The 4 cases above are mutually exclusive and their probability sum to 1.

Since the expected fraction of boxes in each state is the same as the probability of a box being in each of the possible states, the four probabilities above answer the question.

To get the expected number of boxes in each state you would need to multiply each probability by $m$, the number of boxes.