Probability – A Balls-and-Colours Problem

co.combinatoricspr.probabilitypuzzle

A box contains n balls coloured 1 to n. Each time you pick two balls from the bin – the first ball and the second ball, both uniformly at random and you paint the second ball with the colour of the first. Then, you put both balls back into the box. What is the expected number of times this needs to be done so that all balls in the box have the same colour?

Answer (Spoiler put through rot13.com): Gur fdhner bs gur dhnagvgl gung vf bar yrff guna a.

Someone asked me this puzzle some four years back. I thought about it on and off but I have not been able to solve it. I was told the answer though and I suspect there may be an elegant solution.

Thanks.

Best Answer

It can probably be done by looking at the sum of squares of sizes of color clusters and then constructing an appropriate martingale. But here's a somewhat elegant solution: reverse the time!

Let's formulate the question like that. Let $F$ be the set of functions from $\{1,\ldots,n\}$ to $\{1,\ldots,n\}$ that are almost identity, i.e., $f(i)=i$ except for a single value $j$. Then if $f_t$ is a sequence of i.i.d. uniformly from $F$, and $$g_t=f_1 \circ f_2 \circ \ldots \circ f_t$$ then you can define $\tau= \min \{ t | g_t \verb"is constant"\}$. The question is then to calculate $\mathbb{E}(\tau)$.

Now, one can also define the sequence $$h_t=f_t \circ f_{t-1} \circ \ldots \circ f_1$$ That is, the difference is that while $g_{t+1}=g_t \circ f_{t+1}$, here we have $h_{t+1}=f_{t+1} \circ h_t$. This is the time reversal of the original process.

Obviously, $h_t$ and $g_t$ have the same distribution so $$\mathbb{P}(h_t \verb"is constant")=\mathbb{P}(g_t \verb"is constant")$$ and so if we define $\sigma=\min \{ t | h_t \verb"is constant"\}$ then $\sigma$ and $\tau$ have the same distribution and in particular the same expectation.

Now calculating the expectation of $\sigma$ is straightforward: if the range of $h_t$ has $k$ distinct values, then with probability $k(k-1)/n(n-1)$ this number decreases by 1 and otherwise it stays the same. Hence $\sigma$ is the sum of geometric distributions with parameters $k(k-1)/n(n-1)$ and its expectation is $$\mathbb{E}(\sigma)=\sum_{k=2}^n \frac{n(n-1)}{k(k-1)}= n(n-1)\sum_{k=2}^n \frac1{k-1} - \frac1{k} = n(n-1)(1-\frac1{n}) = (n-1)^2 .$$

Related Question