Coupon Collector’s Problem and Stirling numbers of the second kind: rationale

I have basic notions of probability and combinatorics, but I hadn't heard of Stirling numbers (or I forgot them 😐) until I recently fell into the Coupon Collector's Problem, and into the formula for probability of finding all coupons ($T$) after $k$ attempts:

$$P(T \leq k)=n^{-k}n! {k \brace n} \tag*{(1)} \space\space\space\space\space ; (k \geq n)$$

If $k=n$ then:

$$P(T \leq n)=n^{-n}n! {n \brace n} = n^{-n}n! \tag*{(2)}$$

which can be easily understood as $n^n$ is the total number of combinations of n elements with repetition and $n!$ is the total number of combinations of n elements without repetition. That is, we divide the $n$ elements we search in every possible order by all the possible combinations of $n$ extractions, and being each of this combinations equiprobable, the probability is found.

But I cannot understand the simple case for $k=n+1$, which is, using $(1)$:

$$P(T \leq n+1)=n^{-(n+1)}n! {n+1 \brace n} = n^{-(n+1)}n! \binom{n+1}{2} = \frac{n!}{n^{n+1}} \frac{(n+1)n}{2} \tag*{(3)}$$

My question is: why can't this be expressed simply by the intuitive formula:
$$\frac{(n+1)!}{n^{n+1}} \tag*{(4)}$$

that is, as before, we divide the $n+1$ elements we search in every possible order by all the possible combinations of $n+1$ extractions, taking into account that there are not really $n+1$ different coupons, but that one of them represents a repeated value…

There is a difference between $(3)$ and $(4)$ of $\frac{n}{2}$. I supposse $(4)$ is somehow not counting all equiprobable cases, but even with small cases I can't tell where is the difference.

I've used $n=3$ ($k=4$) as the sample case, because it is the first $n$ with which the numerators of $(3)$ and $(4)$ differ:

$$(n+1)! < n!{n+1 \brace n} \tag*{(5)}$$

(with $n=2$ ($k=3$): $(n+1)! = 3! \space $ and $ n!{n+1 \brace n} = 2!{3 \brace 2} = 2!\binom{3}{2} = 2! · 3 = 3!$)

but I cannot tell where does $(4)$ deviates from the equiprobable calculation of cases, because it is not so easy to track the use of the $n+1$ supposed element as the repeated element and compare ${4 \brace 3}$ groups (36) with $4!$ groups (24) and tell where is the mistake… Morevover, watching the 36 different ${4 \brace 3}$ groups makes me feel they are overcounting combinations!

Is there some (intutitive or graphic) way of seeing why $(3)$ is correct vs. $(4)$ is incorrect, or saying it with other words, is there some (intutitive) way of seeing why Stirling numbers of the second kind correctly calculate this? even possibly with values for $k>n+1$.

PS:
Maybe this reasoning fails because $(1)$ is $P(T \leq k)$ and not exactly $P(T = k)$?
$$P(T = k)=n^{-k}n! {k-1 \brace n-1} \tag*{(6)}$$
but nonetheless something fails in my notion of equiprobability and combinatorics even when trying this formula with $k=n+1$… again, why not $(4)$ instead of $(6)$?

Best Answer

When you say

Taking into account that there are not really $n+1$ different coupons, but that one of them represents a repeated value

you should know which value is repeated. You can choose this value in $n$ ways.

Notice, tho, that when you permute all symbols $(n+1)!$ you are counting twice each pattern (cause the two repeated symbols can be one on the left of the other) so you actually have $(n+1)!/2$ that is the $n/2$ that you are missing. The Stirling numbers are used here to count surjective sequences, meaning that you want all of the $n$ elements to appear on the $k$ tries.

Notice, further, that ${n+1\brace n}=\binom{n+1}{2}$ precisely because you are pairing only two elements together, the rest of them are alone. This is the repeated element you are talking about

Best Answer

Related Solutions

Finding the probability function/distribution of Coupon Collector’s problem without Stirling Numbers

Solving the coupon collector problem in a different way (finding P(T=n) directly)

Related Question