Probability of the outcome of this simple card game

card-gamescombinatoricsprobabilityprobability distributions

I have a deck of $N$ cards. $n$ of the cards are red circles and $N-n$ cards are blue circles. I also have an unlimited supply of red square and blue square cards. I play the following game and want to know the probability distribution of the outcome:

  • Repeat the following process until no circle card left:
  • Pick one of the circle cards from the deck at random and replace it with a square card of the same color (let's call this color $c$). Then pick another card from the deck (it could be a circle or square card of any color) at random and replace it with another square card of color $c$.

Note that in each repeated step of the above process, the second replaced card could possibly be the same as the first square card of that step.

Every time you repeat this process, you replace one or two of the circle cards with square cards. In the end, we will have $N$ square cards. I want to know the probability $P(n'|n)$ of having $n'$ red squares and $N-n'$ blue squares at the end, given $n$ initial red circles.

What do I want:

  • Ideally $P(n'|n)$.
  • If that's not easy, a large $N$ approximation could be equally helpful.
  • Alternatively, finding the mean and the variance of $n'$ would be equally helpful. I'm guessing the mean of $n'$ is $n$.
  • The mean and variance of $n'$ for large $N$ would also be good enough if nothing else works.

Update 1: Numerically, it looks like $\left\langle n'\right\rangle = n$, and for large $N$, $\text{var}(n') \approx\frac{3n(N-n)}{4N}$.

Update 2: antkam's answer gives a beautiful symmetry proof for $\left\langle n'\right\rangle = n$. Now, all I need is:

Prove $\text{var}(n') \approx\frac{3n(N-n)}{4N}$ for large N.

Update 3: Some more numerical results that may help to find a proof: as in antkam's answer, we can define the variable $X_i\in\{0,1,2\}$ to be the number of square cards resulted from the $i$'th circle card. $n'$ can be written as
$$n'=\sum_{i=1}^n X_i.$$
We can write the Var($n'$) as
$$
\begin{align}
\text{Var}(n') &= \left\langle\left(\sum_{i=1}^n X_i\right)^2\right\rangle – \left\langle\sum_{i=1}^n X_i\right\rangle^2\\
&=n\left(\text{Var}(X_1)+(n-1)\text{Cov}(X_1,X_2)\right)
\end{align}
$$

Numerically, I have found (Edit: Now both of these partial results are proven in joriki's answer below):

  • $\text{Var}(X_1) = 3/4$
  • Probabilities are $P(X=0) = P(X=2) = 3/8$ and $P(X=1)=1/4$.

Observations:

  • The equality of $P(X=0) = P(X=2)$ can be justfied through the following observation: For $\sum_{i=1}^N X_i = N$ to be satisfied, the number of $X_i=0$ should be exactly the same as the number of $X_j=2$.
  • For $\text{Var}(n')$ to be $\frac{3n(N-n)}{4N}$, the value of $\text{Cov}(X_1,X_2)$ should be $-\frac{3}{4N}$ to first order in $1/N$.

Best Answer

A symmetry-based proof that $E[n'] = n$. Also possibly an approach for approximating, but not calculating exactly, $Var(n')$.


Imagine the original $N$ circle cards are in fact $N$ different colors, $c_1, ..., c_N$. The same process is done to the deck, using as many square cards (of the same $N$ colors) as you need.

When the process is finished (i.e. no more circle cards), let random variable $X_j$ be the number of final cards of color $c_j$. Clearly:

  • $E[X_1] = E[X_2] = \dots = E[X_N]$ by symmetry,

  • $X_1 + X_2 + \dots + X_N = N$ by conservation of the total number of cards,

  • $E[X_1] + E[X_2] + \dots + E[X_N] = N$ combining the above and using linearity of expectation,

  • and therefore: $E[X_j] = 1$ for all $j$.

Now imagine you are partially colorblind and cannot distinguish between $c_1, ..., c_n$ and think of all of them as "red". The final no. of "red" cards (to your eyes) will be $n' \equiv X_1 + \dots + X_n,$ and so we have $E[n'] = n$. QED


Further thoughts on $Var(n')$:

Note that each of the $N$ colors can have at most $2$ final cards, i.e. $X_j \in \{0, 1, 2\}$. It might be possible (but difficult/tedious) to calculate $Var(X_j)$, based on careful case analysis, e.g. $X_j = 2$ iff that $c_j$ circle card was picked as the first card of some round $t$, and, neither of the $2$ square cards of color $c_j$ added to the deck was subsequently replaced in rounds $> t$.

If we know $Var(X_j)$, then based on $n' \equiv X_1 + \dots + X_n,$ a decent approximation might be $Var(n') \approx n \times Var(X_j)$. This would be exact equality if the $X_j$'s were independent, but of course they are actually dependent. However, in the limit (or maybe more stringently: the large $n$ but much larger $N$ limit, i.e. $1 \ll n \ll N$), the approximation might be quite good.

The OP said numerically it seems $Var(n') \approx {3n(N-n) \over 4N}$. This would be consistent with my thoughts above if $Var(X_j) = \frac34,$ in the $1 \ll n \ll N$ limit.

Related Question