Estimate number of balls by picking a random bin

balls-in-binscombinatoricsdiscrete mathematicsestimationsampling

This is a tweak to the standard balls and bins problem where we (usually) come up with bounds on the max load or empty bins. I am interested in estimating $M$ when $M$ balls are (uniformly) thrown into $N$ bins by randomly picking one bin and seeing the number of balls in it. How many bins do I need to query to get a good estimate? Intuitively, it seems like seeing one bin should be enough to estimate $M$ upto constant factors (or $\log M$ factors).

Edit: For e.g., let us say that $M = \Omega(N\log N)$.

Edit 2 (see comments): I guess, my only doubt is whether the error (or $\delta$) we choose in Chernoff bounds directly corresponds to the error in the estimate, or not. Suppose fix a $\delta$ for both sides of deviation, then does the estimate also have the same error on both sides (with the same high probability that we get due to Chernoff bounds).

Best Answer

Let $X_1,\ldots,X_N$ be the number of balls in each bin. Suppose you look at $K$ bins; by symmetry, let's say they're the first $k$. Note that $Y:=\sum_{i=1}^K X_i\sim Bin(M,K/N)$. The expected value of $Y$ is $MK/N$, and is the sum of $M$ $\{0,1\}$ random variables, hence by standard Chernoff bounds, for any $0\leq \delta\leq 1$, \begin{align} \Pr(Y\not\in [(1-\delta)MK/N,(1+\delta)MK/N])&=\Pr(NY/K\not\in [(1-\delta)M,(1+\delta)M])\\ &\leq 2\exp(-\delta^2 KM/3N)\\ &\leq 2\exp(-\Omega(1)\delta^2 K\log N)\\ &\leq \frac{2}{N^{\Omega(1)\delta^2K}}. \end{align}

If you want an estimate multiplicatively within a $\delta$ factor with probability at least $1-\epsilon$, you evidently need to query only $O(\log(1/\epsilon)/\delta^2\log N)$ of the buckets and form the naive estimator. In particular, for fixed $\delta$ and $\epsilon$, you can get away with looking at a single bin for $N$ large enough (assuming $M=\Omega(N\log N)$).

Related Question