Probability – Estimating Number of Books in a Library

probabilitystatistics

My cousin is at elementary school and every week is given a book by his teacher. He then reads it and returns it in time to get another one the next week. After a while we started noticing that he was getting books he had read before and this became gradually more common over time. Naturally, I started to wonder how one could estimate the total number of books in their library.

Say the true number of books in the library is $N$ and the teacher picks one uniformly at random (with replacement) to give to you each week. If at week $t$ you have received a book you have read before $x$ times, is there an unbiased estimator for the total number of books in the library and what is the variance of this estimator? Is there another biased estimator with lower variance?

In my cousin's case, in the first $30$ weeks he received a book he had received before $3$ times.

Best Answer

The Good-Turing estimate is given by $$\hat M ={N \over {1-{N_1 \over K}}}$$ where $N$ is the number of different names observed, $N_1$ is the number of names seen once, and $K$ is the total number of observations. For your data, assuming 24 observations were unique and the 3 duplicates were all seen twice, this yields $$\hat M = {27 \over {1-{24 \over 30}}}= 135.$$ I don't know about the standard error of this estimate; look up Good-Turing and see what you can find.

Related Question