Suppose a coupon collector is collecting a set of $n$ coupons that he receives one-by-one uniformly at random.
If the collector stops exactly when the collection is complete, we know the expected number of coupons in his collection is $n*H[n]$. What is the expected number of copies, $M$, of his most collected coupon?
When $n = 2$, then $M = 2$ because after the first coupon he will collect the other coupon with probability p ~ Geom(1/2). So he will collect the same coupon one additional time on average, and that coupon is certain to be his most collected coupon.
I don't know the exact value for any $n$ larger than 2, but found some approximate values by simulation:
n | M |
---|---|
3 | 2.8415 |
4 | 3.4992 |
5 | 4.0259 |
6 | 4.4633 |
7 | 4.8377 |
8 | 5.1649 |
9 | 5.4560 |
EDIT 1:
For small n enumerating the small possibilities converges faster than simulation, and from this approach I hypothesize that $M[3] = (15 + 6 \sqrt{5})/10$ but don't have any real argument to support that claim.
EDIT 2:
With some more thought, I find that M has an explicit sum formula. The probability the collection terminates with a given distribution of coupons $v = \{c_1, …, c_{n-1}\}$ other than the final collected coupon is just the multinomial coefficient $(c_1; …; c_{n-1})$ over $n^\text{total # of coupons in the collection}$. This is an infinite sum over n-1 variables. Here is Mathematica code the computes the sum for terms where the collection has no more than k copies of any coupon:
M[n_, k_] := Sum[Max[v]*(Multinomial @@ v)/n^Total[v], {v, Tuples[Range[1, k], n-1]}]
Mathematica can't actually evaluate this though for $k \rightarrow \infty$ though.
Best Answer
Value of $M_3$. I will denote $M$ as $M_n$ to emphasize the dependence on $n$. Then by using A230137, we get
\begin{align*} M_3 &= \sum_{n=2}^{\infty} \frac{\sum_{k=1}^{n-1} \binom{n}{k} \max\{k,n-k\}}{3^n} \\ &= \sum_{n=1}^{\infty} \frac{\left( 4^n + \binom{2n}{n} \right)n - 2(2n)}{3^{2n}} + \sum_{n=1}^{\infty} \frac{\left( 4^n + \binom{2n}{n} \right)(2n+1) - 2(2n+1)}{3^{2n+1}} \\ &= 3\left(\frac{1}{2} + \frac{1}{\sqrt{5}}\right) \\ &\approx 2.84164, \end{align*}
which is the same as what OP conjectured.
Asymptotic Formula of $M_n$. A heuristic seems suggesting that
$$M_n \sim e \log n \qquad \text{as} \quad n \to \infty,$$
The figure below compares simulated values of $M_n$ and the values of the asymptotic formula.
Heuristic argument. First, we consider the Poissonized version of the problem:
1. Let $N^{(1)}, \ldots, N^{(n)}$ be independent Poisson processes with rate $\frac{1}{n}$. Then the arrivals in each $N^{(i)}$ model the times when the collector receives coupons of type $i$.
2. Let $T$ be the first time a complete set of coupons is collected. Then we know that $T$ has the same distribution as the sum of $n$ independent exponential RVs with respective rates $1$, $\frac{n-1}{n}$, $\ldots$, $\frac{1}{n}$. (This is an example of the exponential race problem.) For large $n$, this is asymptotically normal with mean $\sum_{i=1}^{n}\frac{n}{i} \sim n \log n$ and variance $\sum_{i=1}^{n}\frac{n^2}{i^2} \sim \zeta(2)n^2$. This tells that $T \sim n \log n$ in probability, and so, it is not unreasonable to expect that
$$ M_n = \mathbf{E}\Bigl[ \max_{1\leq i\leq n} N^{(i)}_T \Bigr] \sim \mathbf{E}\Bigl[ \max_{1\leq i\leq n} N^{(i)}_{n\log n} \Bigr]. \tag{1} $$
holds as well.
3. Note that each $N^{(i)}_{n\log n}$ has a Poisson distribution with rate $\log n$. Now, we fix $\theta > 1$ and choose an integer sequence $(a_n)$ such that $a_n > \log n$ and $a_n/\log n \to \theta$ as $n \to \infty$. Then
\begin{align*} \mathbf{P}\Bigl( N^{(i)}_{n\log n} > a_n \Bigr) &\asymp \frac{(\log n)^{a_n}}{a_n!}e^{-\log n} \\ &\asymp \frac{(\log n)^{a_n}}{(a_n)^{a_n} e^{-a_n+\log n} \sqrt{a_n}} \\ &\asymp \frac{1}{n^{\psi(\theta) + 1 + o(1)}}, \qquad \psi(\theta) = \theta \log \theta - \theta, \end{align*}
where $f(n) \asymp g(n)$ means that $f(n)/g(n)$ is bounded away from both $0$ and $\infty$ as $n\to\infty$. This shows that
$$ \mathbf{P}\Bigl( \max_{1\leq i\leq n} N^{(i)}_{n\log n} \leq a_n \Bigr) = \biggl[ 1 - \frac{1}{n^{\psi(\theta) + 1 + o(1)}} \biggr]^n \xrightarrow[n\to\infty]{} \begin{cases} 0, & \psi(\theta) < 0, \\[0.5em] 1, & \psi(\theta) > 0. \end{cases} $$
So, by noting that $\psi(e) = 0$, we get
$$ \max_{1\leq i\leq n} N^{(i)}_{n\log n} \sim e \log n \qquad \text{in probability}. \tag{2} $$
This suggests that $M_n \sim e \log n$ as well.
4. I believe that this heuristic argument can be turned into an actual proof. To this, we need the following:
We have to justify the relation $\text{(1)}$ in an appropriate sense. We may perhaps study the inequality
$$ \max_{1\leq i\leq n} N^{(i)}_{(1-\varepsilon)n\log n} \leq \max_{1\leq i\leq n} N^{(i)}_T \leq \max_{1\leq i\leq n} N^{(i)}_{(1+\varepsilon)n\log n} \qquad \text{w.h.p.} $$
and show that the probability of the exceptional event is small enough to bound the difference of both sides of $\text{(1)}$ by a vanishing qantity.
Note that $\text{(2)}$ alone is not enough to show this. So, we need to elaborate the argument in step 3 so that it can actually prove the relation $\mathbf{E}\bigl[ \max_{1\leq i\leq n} N^{(i)}_{n\log n} \bigr] \sim e \log n$.