Probability Distribution of Set Coverage After Random Selection – Probability and Statistics

probabilityprobability distributionsstatistics

I have a set of numbers where I am randomly and independently selecting elements within a set . After a number of these random element selections I want to know the coverage of the elements in the set. Coverage being how many elements from the set have been selected at least once divided by the total number of elements in the set.

To restate this: what is the probability distribution of the different coverage values on a set after $X$ randomly, independently selected elements of the set?

Best Answer

If there are $n$ elements of the set then the probability that $M=m$ have been selected after a sample of $x$ (with replacement) is

$$\frac{S_2(x,m) \; n!}{n^x \; (n-m)!} $$

where $S_2(x,m)$ is a Stirling number of the second kind.

The expected value of $M$ is: $n \left(1- \left(1-\dfrac{1}{n}\right)^x \right)$.

The variance is: $n\left(1-\dfrac{1}{n}\right)^x + n^2 \left(1-\dfrac{1}{n}\right)\left(1-\dfrac{2}{n}\right)^x - n^2\left(1-\dfrac{1}{n}\right)^{2x}. $