Wrong with the approach for CLRS 5.4-6 : Given n balls and n bins,find expected number of empty bins

balls-in-binsexpected valueprobability

I am trying to find expected number of empty bins after n balls are tossed into n bins. And each toss is independent and equally likely to end up in any bin. Below is my approach.

My indicator variable is
$X_i$ : i bins are empty
$$ Pr[X_i]= \frac{\binom{n}{n-i} * n^\left(n-i\right)}{n^n}$$
And excepted number of empty bins is :

$$
\sum_{i=1}^\left(n-1\right) i*Pr[X_i]
$$

After simplifying the above equation I get:
$$
\sum_{i=1}^\left(n-1\right) \frac{\left(n-1\right)!}{\left(i-1\right)!*\left(n-i\right)!*n^\left(i-1\right)}
$$

But in the solution that I found on the web, the indicator variable is : Let Xi be the event that all the balls fall in bins, other than the ith. And then expected number of empty bins is :

$$
\sum_{i=1}^n \left(\frac{n-1}{n}\right)^n
$$

But according to me the the indicator variable chosen above is wrong. As they are adding the probabilities that ith bin is empty.So at a time only one bin is considered empty. Whereas there can be more than one bin empty at a time.
Is there something wrong with my understanding of the above problem?

Best Answer

The web solution is correct; it uses linearity of expectation. Suppose we label the boxes $1$ through $n$, and we determine $X_1$, the expected number of boxes numbered $1$ that are empty. Since there is obviously only one box numbered $1$, this value is equal to the probability that that one box is empty:

$$ E(X_1) = P(\text{box $1$ is empty}) = \left(\frac{n-1}{n}\right)^n $$

By symmetry, we have $E(X_1) = E(X_2) = \cdots = E(X_n)$, and then by linearity of expectation, the expected number of boxes that are empty is

\begin{align} E(X) & = E(X_1+X_2+\cdots+X_k) \\ & = E(X_1)+E(X_2)+\cdots+E(X_n) \\ & = nE(X_1) \\ & = \frac{(n-1)^n}{n^{n-1}} \end{align}


You can also proceed the way you set out. However, in the expression you give for the probability that exactly $k$ boxes are empty, the numerator $\binom{n}{k} n^{n-k}$ does not, unfortunately, count only those cases; it also includes cases where more than $k$ boxes are empty, too (and overcounts them, to boot).

The correct expression is not trivial; it is

$$ \frac{n!}{k!} S(n, n-k) $$

where $S(\cdot, \cdot)$ are Stirling numbers of the second kind. See this OEIS entry for more details on the particular count. It is also written

$$ \frac{n!}{k!} \left\{ n \atop n-k \right\} $$

All in all, you're better off using linearity of expectation. :-)

Related Question