[Math] Balls and bins conditioned on the number of non-empty bins

balls-in-binsprobability

The expected number of occupied bins in the standard balls and bins problem (with $m$ balls into $N$ bins) is $N\left( {1 – {{\left( {1 – \frac{1}{N}} \right)}^m}} \right)$. How do I compute this expectation given that at least $k$ bins are non-empty after the $m$ balls were thrown?

In particular, how do I formulate the probability that some $i^{th}$ bin is empty given that at least $k$ bins are non-empty? in addition, it seems obivious, but how do I prove that the conditional expectation is a monotonic increasing function of $k$?

Best Answer

The standard argument to get the expected number of bins occupied is greatly simplified by the linearity of expectation. We only need to compute the probability that a single bin is filled. This is simplified by computing the probability that that bin is empty: $\left(1-\frac1n\right)^m$. Thus, the probability that that bin is filled is $1-\left(1-\frac1n\right)^m$. Linearity of expectation says that the expected number of bins filled would be $$ n\left(1-\left(1-\frac1n\right)^m\right)\tag{1} $$ as you have stated.

However, when assuming that at least $k$ bins have been filled, we can not use all of the preceding simplifications.


Inclusion-Exclusion

One method of attack is using the Inclusion-Exclusion Principle. Let $S(i)$ be all the possible arrangements where bin $i$ is empty. Then we can compute $N(k)$, the size of the intersection of $k$ of the $S(i)$: there are $\binom{n}{k}$ ways to choose the empty bins and $(n-k)^m$ ways to put the $m$ marbles in to the remaining bins. Therefore, $$ N(k)=\binom{n}{k}(n-k)^m\tag{2} $$ Now, to compute the number of arrangements in which exactly $k$ bins are filled, we compute the number of elements in exactly $n-k$ of the $S(i)$: $$ \begin{align} &\sum_{j}(-1)^{j-n+k}\binom{j}{n-k}N(j)\\ &=\sum_{j}(-1)^{j-n+k}\binom{j}{n-k}\binom{n}{j}(n-j)^m\\ &=\sum_{j}(-1)^{j-n+k}\binom{k}{j+k-n}\binom{n}{k}(n-j)^m\\ &=\binom{n}{k}\sum_{j}(-1)^{k-j}\binom{k}{j}j^m\tag{3} \end{align} $$ The number of ways to get at least $k$ bins filled is $$ \begin{align} &\sum_{i\ge k}\sum_j(-1)^{i-j}\binom{n}{i}\binom{i}{j}j^m\\ &=\sum_i\sum_j(-1)^{k-j}\binom{-1}{i-k}\binom{n}{j}\binom{n-j}{n-i}j^m\\ &=\sum_j(-1)^{k-j}\binom{n}{j}\binom{n-j-1}{n-k}j^m\tag{4} \end{align} $$ The number of bins times the number of ways to get at least $k$ bins filled is $$ \begin{align} &\sum_{i\ge k}\sum_j(-1)^{i-j}i\binom{n}{i}\binom{i}{j}j^m\\ &=\sum_i\sum_j(-1)^{k-j}i\binom{-1}{i-k}\binom{n}{j}\binom{n-j}{n-i}j^m\\ &=\sum_i\sum_j(-1)^{k-j}[k+(i-k)]\binom{-1}{i-k}\binom{n}{j}\binom{n-j}{n-i}j^m\\ &=\sum_i\sum_j(-1)^{k-j}\binom{n}{j}\left[k\binom{-1}{i-k}-\binom{-2}{i-k-1}\right]\binom{n-j}{n-i}j^m\\ &=\sum_j(-1)^{k-j}\binom{n}{j}\left[k\binom{n-j-1}{n-k}-\binom{n-j-2}{n-k-1}\right]j^m\tag{5} \end{align} $$ Dividing $(5)$ by $(4)$ yields the expected value $$ k-\frac{\sum\limits_{j=0}^n(-1)^{k-j}\binom{n}{j}\binom{n-j-2}{n-k-1}j^m}{\sum\limits_{j=0}^n(-1)^{k-j}\binom{n}{j}\binom{n-j-1}{n-k}j^m}\tag{6} $$ where $\binom{-1}{k}=(-1)^k$ and $\binom{-2}{k}=(-1)^k(k+1)$ and $0^0=1$.


Example

For $n=6$, $m=4$, $k=2$: $$ \begin{align} &2-\frac{\small\binom{6}{0}\binom{4}{3}0^4{-}\binom{6}{1}\binom{3}{3}1^4{+}\binom{6}{2}\binom{2}{3}2^4{-}\binom{6}{3}\binom{1}{3}3^4{+}\binom{6}{4}\binom{0}{3}4^4{-}\binom{6}{5}\binom{-1}{3}5^4{+}\binom{6}{6}\binom{-2}{3}6^4} {\small\binom{6}{0}\binom{5}{4}0^4{-}\binom{6}{1}\binom{4}{4}1^4{+}\binom{6}{2}\binom{3}{4}2^4{-}\binom{6}{3}\binom{2}{4}3^4{+}\binom{6}{4}\binom{1}{4}4^4{-}\binom{6}{5}\binom{0}{4}5^4{+}\binom{6}{6}\binom{-1}{4}6^4}\\ &=2-\frac{0-6+0-0+0-(-3750)+(-5184)}{0-6+0-0+0-0+1296}\\ &=\frac{670}{215}\doteq3.11627906976744 \end{align} $$ whereas, without the knowledge that at least two bins were not empty, we would get $$ 6\left(1-\left(1-\frac16\right)^4\right)=\frac{671}{216}\doteq3.10648148148148 $$ Not a terribly large difference, but with $4$ balls into $6$ bins, you wouldn't expect them to all land in one bin, so the knowledge that at least two bins were not empty is not very significant.

Related Question