[Math] Estimate the number of elements by random sampling with replacement

probability

Setup:

  1. $N$ numbered balls are in a bag. $N$ is unknown.
  2. Pick a ball uniformly at random, record its number, replace it, shuffle.
  3. After $M$ samples, of which we noticed $R$ repeated numbers, how can we estimate the value of $N$?

Clearly, after we've covered the set, only repeats will appear. However, there is a vanishingly small probability we've just missed one.

Simplification:

Assume we know $N<n^\star$, which should make it easy to compute $P(N=n|R=r,M=m)$ for all $n<n^\star$.

Possibly related:

Application:

Estimating the number of webcomics when I'm pressing "random," assuming random is uniform over the comics.

Best Answer

Chance of getting a repeat = (M-R)/N which we are going to call a "failure". Call this probability (M-R)/N = (1-P). Now the number of successful draws we have had (without getting a repeat) is M-R. Call this K. Now we just have a negative binomial distribution with R failures occurring.

The mean of the neg. binomial distribution is the number of successes before the Rth failure. In this case M-R. So M-R = pR/(1-p) with p being defined above. $$ \ M-R = (1 - (M-R)/N))R / (M-R)/N $$ $$ \ = (R - (M-R)R/N) * (N / (M-R)) = NR/(M-R) - R = M - R $$ $$ \ = M = NR/(M-R) = M^2 - RM = NR --> N = (M^2/R) - M $$

Related Question