Coupon Collector’s Problem and Stirling numbers of the second kind: rationale

coupon-collectorstirling-numbers

I have basic notions of probability and combinatorics, but I hadn't heard of Stirling numbers (or I forgot them ๐Ÿ˜) until I recently fell into the Coupon Collector's Problem, and into the formula for probability of finding all coupons ($T$) after $k$ attempts:

$$P(T \leq k)=n^{-k}n! {k \brace n} \tag*{(1)} \space\space\space\space\space ; (k \geq n)$$

If $k=n$ then:

$$P(T \leq n)=n^{-n}n! {n \brace n} = n^{-n}n! \tag*{(2)}$$

which can be easily understood as $n^n$ is the total number of combinations of n elements with repetition and $n!$ is the total number of combinations of n elements without repetition. That is, we divide the $n$ elements we search in every possible order by all the possible combinations of $n$ extractions, and being each of this combinations equiprobable, the probability is found.

But I cannot understand the simple case for $k=n+1$, which is, using $(1)$:

$$P(T \leq n+1)=n^{-(n+1)}n! {n+1 \brace n} = n^{-(n+1)}n! \binom{n+1}{2} = \frac{n!}{n^{n+1}} \frac{(n+1)n}{2} \tag*{(3)}$$

My question is: why can't this be expressed simply by the intuitive formula:
$$\frac{(n+1)!}{n^{n+1}} \tag*{(4)}$$

that is, as before, we divide the $n+1$ elements we search in every possible order by all the possible combinations of $n+1$ extractions, taking into account that there are not really $n+1$ different coupons, but that one of them represents a repeated value…

There is a difference between $(3)$ and $(4)$ of $\frac{n}{2}$. I supposse $(4)$ is somehow not counting all equiprobable cases, but even with small cases I can't tell where is the difference.

I've used $n=3$ ($k=4$) as the sample case, because it is the first $n$ with which the numerators of $(3)$ and $(4)$ differ:

$$(n+1)! < n!{n+1 \brace n} \tag*{(5)}$$

(with $n=2$ ($k=3$): $(n+1)! = 3! \space $ and $ n!{n+1 \brace n} = 2!{3 \brace 2} = 2!\binom{3}{2} = 2! ยท 3 = 3!$)

but I cannot tell where does $(4)$ deviates from the equiprobable calculation of cases, because it is not so easy to track the use of the $n+1$ supposed element as the repeated element and compare ${4 \brace 3}$ groups (36) with $4!$ groups (24) and tell where is the mistake… Morevover, watching the 36 different ${4 \brace 3}$ groups makes me feel they are overcounting combinations!

Is there some (intutitive or graphic) way of seeing why $(3)$ is correct vs. $(4)$ is incorrect, or saying it with other words, is there some (intutitive) way of seeing why Stirling numbers of the second kind correctly calculate this? even possibly with values for $k>n+1$.

PS:
Maybe this reasoning fails because $(1)$ is $P(T \leq k)$ and not exactly $P(T = k)$?
$$P(T = k)=n^{-k}n! {k-1 \brace n-1} \tag*{(6)}$$
but nonetheless something fails in my notion of equiprobability and combinatorics even when trying this formula with $k=n+1$… again, why not $(4)$ instead of $(6)$?

Best Answer

When you say

Taking into account that there are not really $n+1$ different coupons, but that one of them represents a repeated value

you should know which value is repeated. You can choose this value in $n$ ways.

Notice, tho, that when you permute all symbols $(n+1)!$ you are counting twice each pattern (cause the two repeated symbols can be one on the left of the other) so you actually have $(n+1)!/2$ that is the $n/2$ that you are missing. The Stirling numbers are used here to count surjective sequences, meaning that you want all of the $n$ elements to appear on the $k$ tries.

Notice, further, that ${n+1\brace n}=\binom{n+1}{2}$ precisely because you are pairing only two elements together, the rest of them are alone. This is the repeated element you are talking about

Related Question