[Math] Balls and bins variation

co.combinatoricspr.probability

How many balls have to be thrown uniformly at random into $m$ bins, such that with high probability $n_1, n_2, \dots, n_m$ are distinct numbers, where $n_i$ is the number of balls in bin $i$ ?

Is there anything known about this problem? A trivial lower bound is $m(m-1)/2$, as we need $m$ distinct values.

Best Answer

This happens whenever $n\gg m^5$. To see this, notice that the expected number of balls in each bin is $n/m$ and the variance is also on the order of $n/m$. The distribution "tends" to N(n/m,n/m) (in the sense of CLT). We also have a local CLT, meaning that the for values $k\in\{n/m-\sqrt{n/m},\ldots,n/m+\sqrt{n/m}\}$ we have that $\mathbb{P}(n_1=k)\approx \sqrt{m/n}$.

So this means that we have a birthday paradox kind of setting where we have roughly $m$ numbers roughly uniformly distributed among roughly $\sqrt{n/m}$ values. In this scenario with high probability you have no collisions exactly when $m \ll \sqrt{\sqrt{n/m}}$, i.e. when $n \gg m^5$.

There is a technicality here, the values $n_i$ are not independent. But they are almost independent, so one can get the desired result by a second moment argument on the number of collisions.

EDIT: one side of the birthday paradox argument is easier to get since it involves only a union bound. If we look at two specific bins than asymptotically we see two independent Poisson($n/m$) random variables and the probability of collision is $K \sqrt{m/n}$ for some constant $K$ (which can be explicitly computed). We have ${m \choose 2}$ pairs of bins so the probability of collision can be asymptotically bounded by $K \sqrt{m^5/n}$. I expect the lower bound to be of the same order of magnitude.

Related Question