Modified coupon collector’s problem, where you can trade off excess coupons

coupon-collectorprobability

I am struggling with a variation of the coupon collector's problem. Suppose that we need to collect $n$ distinct coupons by buying boxes of toys, where each box contains one coupon with uniform probability. The manufacturer however allows customers to trade any ten coupons (not necessarily the same) for any one coupon.

Now suppose that my strategy is to keep buying boxes and hoarding coupons, until I can trade all my excess coupons to get an entire set of $n$ distinct coupons. What is then the expectation of the number of boxes that I need to buy?

To be more precise, let $X_i$ be the number of unique coupons after buying $i$ boxes, and let $Y_i = i – X_i$. Let $I = \min\{i\in \mathbb{N}:X_i + \frac{Y_i}{10} \geq n\}$. I am then asking for $\mathbb{E}(I)$.

(Edit: cleaned up the notations in the last part)

Best Answer

It is possible to calculate the expectation for the total number of coupons needed when there are $n$ types of coupons and $s$ spare coupons can be swapped at the end for $1$ of a missing type. Clearly the minimum possible needed is $n$ when each coupon drawn is distinct and the maximum is $1+(n-1)s$ when each coupon drawn is the same; any integer in between these is a possible answer. Towards the top of that range the probabilities of needing that many become extremely small, and at the very top is $n^{-(n-1)s}$.

The following R code finds that, with $n=100,s=10$, the expected number is about $209.7689$

probneeded <- function(n, swap){
  if(n==1){return(1)}
  probtable <- rep(0,n*swap)
  probs <- rep(0,n)
  coupons <- 1
  probs[1] <- 1
  probstop <- 0
  while (sum(probs) > 0){
    coupons <- coupons + 1
    probs <- (c(probs,0) * (1:(n+1))/n + c(0,probs) * (n:0)/n)[-(n+1)]
    enough <- which(coupons + (1:n)*(swap-1) >= n * swap)
    probstop[coupons] <- sum(probs[enough])
    probs[enough] <- 0
    }
  probstop
  }

expectedneeded <- function(n, swap){
  probstop <- probneeded(n, swap)
  sum((1:length(probstop)) * probstop)
  }  

expectedneeded(10,10)
# 20.63904
expectedneeded(100,10)
# 209.7689
expectedneeded(1000,10)
# 2100.671
expectedneeded(10000,10)
# 21009.7

Empirically this seems to suggest that with $n > s=10$ the expected number is something about $2.101 n -0.3$.

A handwaving approximate argument might suggest the $2.101$ is reasonable when $s=10$:

  • If you have $x$ coupons, then the expected number of distinct types is $n(1-(1-\frac1n)^x)\approx n-ne^{-x/n}$
  • and the expected number of surplus coupons is then about $x- n+ ne^{-x/n}$.
  • That then suggest you might expect to be just be able to swap to get all $n$ distinct types when $n-ne^{-x/n} +\frac{x- n+ ne^{-x/n}}{s} = n$, i.e. when $\frac x n + (s-1)e^{-x/n} =1 $
  • This has the solution $\frac{x}{n}= 1+ W\left(\frac{s-1}{e}\right)$ using the Lambert W function
  • For $s=10$ you have $1+ W\left(\frac{9}{e}\right) \approx 2.101002997$ which is close to what was observed from the exact calculations; checking other values of $s$ suggests this approach works more generally.

The distribution of the number needed is slightly peculiar. When $n=100,s=10$, the standard deviation is about $11.79$, the median is $208$, as is the mode. The five most likely values are $208,217,199,226,190$ and together they have a probability just over $\frac 12$ of being the stopping point, even though intermediate values are also possible; what these five have in common is that if they are stopping points when there may be no surplus coupons after swapping - for example if you have $88$ distinct types and $120$ spare coupons to swap making to $208$ then you can exactly reach $88+\frac{120}{10}=100$ after swapping with no surplus.