Constrained combinatorial question: 2 types of balls divided into k groups with limits

binomial-coefficientscombinatoricsdiscrete mathematicsgenerating-functionsprobability

Let's say we have two types of balls, black and white. There are $B$ black balls and $W$ white balls, s.t. $B + W = N$ where $N$ is the total number of balls.

We want to divide these $N$ balls evenly into $k$ groups, each with integer size $n = \frac{N}{k}$ balls per group. We further require that there are fewer than $l$ black balls per group. That is, if $b_i$ is the number of black balls in group $i$, then $b_i < l, 1 \leq i \leq k$.

The question is: What is the probability that no group violates the black ball limit if the balls are distributed randomly?

The motivation for this question is rooted in the concept of sharding a distributed state machine, where you have $N$ nodes, of which $B$ are Byzantine (potentially faulty), and you want to divide the nodes into $k$ groups, none of which can have more than $l$ Byzantine nodes for the system to remain secure. Existing sharding architectures mostly assume that the distribution of nodes follows a cumulative binomial distribution (e.g. Ethereum sharding), but this is clearly wrong as sampling is done without replacement.

Using the correct base hypergeometric distribution, it is straightforward to solve for the joint pdf:
\begin{equation}
P(B_1 = b_1, \ldots, B_{k} = b_{k})
= \frac{1}{\binom{N}{B}} \cdot \prod_{j=1}^{k} \binom{n}{b_j}
\end{equation}

where $B_i$ is the random variable whose outcomes $b_i$ are the number of black balls in shard $i$.

We can then solve for the CDF, but end up with a nasty sum:

\begin{equation}
P(B_1 \leq l, \ldots B_{k} \leq l)
= \frac{1}{\binom{N}{B}} \cdot \overbrace{\sum_{b_1=0}^l…\sum_{b_k=0}^l}^{\sum_i^k b_i = B} \prod_{i=0}^l \binom{n}{b_i}
\end{equation}

This CDF evokes generalized Vandermonde's Identity, but that gives the unconstrained solution.

My current approach is as follows:

  1. Count the number of ways to divide the $B$ black balls into $k$ bins with no more than $l$ black balls in any bin. This is essentially this answer that uses a combinatorial PIE, or equivalently using a constrained generating function.
  2. Count the number of ways to divide the $B$ black balls into $k$ bins with no more than $n$ black balls in any bin, by the same logic.
  3. Divide 1 (the total number of correct solutions) by 2 (the total number of possible solutions) to get the probability

My thinking is that once the $B$ black balls are assigned, then the $W$ white ball placement is already determined: $w_i = n – b_i$.

But this solution does not match simulated results. For example, for $N = 30$, $B=3$, $k = 5$, and $l = 2$, the above logic says the probability of no limit being broken is 85.7%, while simulation indicates it is 97.6%.

Is there a closed form solution to this probability? Thanks!

Best Answer

We have $30$ balls, of which $3$ are black. The balls are divided into five groups of six each. Now, the balls are distributed "randomly", but there are many different ways to randomize things, and they produce different answers. In this case, it seems reasonable to assume that the balls are set out in a row like a shuffled deck of cards, so that each of the $\binom{30}3$ distinguishable arrangements are equally likely, and that we then divide them into groups of six going from left to right. This procedure is appropriate for many applications. I cannot say whether it is the best for your application.

The condition is that each group have no more than two black balls.

  • There is $\binom60=1$ way that a group can have no black balls.
  • There are $\binom61=6$ ways that a group can have one black ball.
  • There are $\binom62=15$ ways that a group can have two black balls.

Thus each group can be represented by the generating function $1+6x+15x^2$.

There are five groups, so the generating function to count all layouts which obey the condition is

$$(1+6x+15x^2)^5$$

The expansion begins

$$1+30x+435x^2+3960x^3+24930x^4+\ldots$$ which tells us that there is one layout with no black balls, thirty layouts with one black ball, $435$ layouts with two black balls, etc. In your example we get $3960$ layouts, and on dividing by $\binom{30}3=4060$, the result is a probability of $0.97537...$

In general, the coefficients can be calculated with the multinomial theorem. Here the coefficient for $x^p$, where $p$ represents the number of black balls, is

$$\sum_{i+j+k=5;\ j+2k=p} \binom{5}{i,j,k}1^i6^j15^k$$

where $\binom{5}{i,j,k}=\frac{5!}{i!j!k!}$. Since this example involves a trinomial, $1+6x+15x^2$, we have three variables $i,j,k$ and we need to find non-negative integer solutions of the system $i+j+k=5;\ j+2k=p$. For $p=3$ the solutions are $(3,1,1)$ and $(2,3,0)$ so the sum is

$$20\cdot 1^3\cdot 6^1\cdot 15^1+10\cdot 1^2\cdot 6^3\cdot 15^0=1800+2160=3960$$

Larger examples can be done by a computer algebra system, or just any package which can multiply polynomials together.