Probability Mass Function of Number of Draws with Replacement Until N Distinct Results are Obtained

combinatoricsdiscrete mathematicsprobabilityprobability distributions

Let $X$ be the discrete random variable that corresponds to the number of draws (with replacement) from a population of $k$ distinct objects required until $n$ distinct results are obtained, where $1\le n\le k$. Each item is equally likely to be drawn.

What is the probability mass function (PMF) of $X$? Its support is $[n,\infty]\cap\mathbb{N}$.

The only portion I'm able to derive is this:

$$P(X=n)=\prod_{i=k-n+1}^{k}\frac{i}{k}$$

But what about on the rest of the support? I assume it's going to involve combinatorics, but I'm not sure how to derive it.

Best Answer

Let $Y_i$ denote the $i^{th}$ sample. To choose a situation where $X=t$, you need to

  • choose the value of $Y_t$ in $k$ ways.
  • choose the set $S$ of $n-1$ values appearing among $\{Y_1,\dots,Y_{t-1}\}$ in $\binom{k-1}{n-1}$ ways.
  • partition the numbers $\{1,2,\dots,t-1\}$ into $n-1$ sets in ${t-1\brace n-1}$ ways. $^*$

  • assign to each part in the partition a distinct element of $S$ in $(n-1)!$ ways. The interpretation is that if a part is assigned the value $x$, then $Y_i=x$ for each $i$ in that part.

Therefore, \begin{align} P(X=t) &=k^{-t}\cdot k\cdot \binom{k-1}{n-1}{t-1\brace n-1}(n-1)! \\&=\boxed{{t-1\brace n-1}\frac{k!}{(k-n)!k^{t}}} \end{align} For example, $$ P(X=n)=\frac{(k-1)(k-2)\dots(k-n+1)}{k^{n-1}} $$ as expected. For another sanity check, you can verify the above is a probability distribution using the fourth equation in https://en.wikipedia.org/wiki/Stirling_numbers_of_the_second_kind#Generating_functions.

$^*$The notation ${m \brace j}$ refers to the Stirling numbers of the second kind, which is the number of ways to partition a set of size $m$ into $j$ parts, where the order of the parts does not matter. This can be computed via $${m\brace j}=\frac1{j!}\sum_{i=0}^j(-1)^i\binom{j}i(j-i)^m.$$

Related Question