Expectation problem of postman

conditional-expectationexpected valuepuzzle

A postman brought N letters to a house with two letter-boxes. Since the two boxes were empty, he puts 1 mail in each of the two mail boxes. Then he chooses one of boxes with probability proportional to number of letters present in that box, and puts the 3rd letter in it. He does this for all subsequent letters. What is the expected number of letters in the box with lower letters?

This problem is from Crazy Postman.

My approach: I used recurrence for solving this problem. Suppose $E_n$ is the expected number of letters in the box with lower letters. Now, $E_{n-1}$ is expected number of letters in box with lower letters if number of letters is n-1, so $n-1-E_{n-1}$ is the expected number of letters in box with higher letters. Now,

$E_n = \frac{n-1-E_{n-1}}{n-1} (E_{n-1}) + \frac{E_{n-1}}{n-1}(E_{n-1}+1)$

Is this correct?

Explanation of the equation: With probability of number of letters in higher letters box, we can say number of letters in lower letters box remains constant and with 1- that probability, number of letters in lower letters box will increase by 1.

My question is can we take the probability by the expected number of letters or do we need the exact number?

Additionally, can someone explain the solution given in the site? I have quoted it here.

Suppose I have a stack of 2 black cards and one red card. Initially I put red between 2 black cards. Now I add black cards randomly between any two cards (so, initially it is either above or below red). Note that the probability that I add the card above the red card, when x-1 is the number of cards above red and y-1 is the number of cards below red is x/(x+y). Let the problem be: if red card is dividing the black cards into two sets, what is the expected number of black cards in the smaller section. So, we see that the two problems are equivalent.
Now this way, we are getting all possible combinations in which one red and n black cards can be mixed, we see that the probability that the red card is at height h is independent of h. So, the probability that the smallest box contains n/2 letter or 1 letter (or any number of letters between 1 and n/2) are all same. So, expected number of letters in the smaller box is asymptotically n/4.

I am unable to understand the logic/flow of the solution given. Kindly help.

Best Answer

Your approach does not work, and I am not sure if it can be fixed. The errors in your approach are subtle.

The first error is best illustrated by looking at $n=3$. We start with $E_2=1$. Clearly, $E_3=1$, because the smaller box will certainly have exactly one letter when three letters are placed in the boxes. However, your formula gives $$ E_3\stackrel{?}= \frac{3-1-1}{3-1}\cdot 1+\frac{1}{3-1}(1+1)=3/2\neq 1. $$ What went wrong? Well, when the two letter have an equal number of letters, the number of letter in the smaller box cannot increase by adding a single letter, so the $(E_{n-1}+1)$ part of your formula needs to be replaced by $E_{n-1}$ in this case.

If this was the only error, your method could be salvaged, but there is another problem. Let $M_n$ be the random variable equal to the number of letters in the smaller box after $n$ letters have been put in the boxes. This means that $\mathbb E[M_n]=E_n$. It is correct to say that $$ \newcommand{\E}{\mathbb E} \E[M_n\mid M_{n-1}]= \begin{cases} (1-\frac{M_{n-1}}{n-1})M_{n-1}+\frac{M_{n-1}}{n-1}\cdot (M_{n-1}+1) & \text{if $M_{n-1}\neq (n-1)/2$} \\ M_{n-1} & \text{if $M_{n-1}=(n-1)/2$} \end{cases} $$ Note that this is a conditional expectation. We need conditional expectation because the probability of $M_n=M_{n-1}+1$ depends on the exact value of $M_{n-1}$.

Let us now take the expected value of both sides of the this equation. On the left hand side, we get $\E[\E[M_n\mid M_{n-1}]]=\E[M_n]=E_n$. However, on the right hand side, we get (I am ignoring the second case, for simplicity): $$ \E\left[(1-\frac{M_{n-1}}{n-1})M_{n-1}\right]+\E\left[\frac{M_{n-1}}{n-1}\cdot (M_{n-1}+1)\right] $$ However, since you can not distributed the expected value over a product (in general, $\E[YZ]\neq \E[Y]\E[Z])$, there is no way to simplify this expression in a way that $E_n$ appears. Therefore, your recurrence strategy cannot be made to work.

The solution is to derive a recurrence for the full probability distribution of the number of letters in both boxes, not just the expected value. Let $P(n,k)$ be the probability that the box on the left has exactly $k$ letters, after the combined total of the boxes has $n$ letters. You can show that $$ P(n,k)= P(n-1,k-1)\cdot \frac{k-1}{n-1} + P(n-1,k)\cdot \frac{n-k-1}{n-1} $$ This is because there are two ways for the left box to have $k$ letters. Either the left box has $k-1$ letters after $n-1$ letters, and then receives another letter with probability $(k-1)/(n-1)$, or it has $k$ letters, and then the other box gets the $n^\text{th}$ letter with probability $(n-1-k)/(n-1)$. Using this formula, you can compute the full list of values $P(n,k)$ for $1\le k \le n-1$ for a couple small values of $n$. Eventually, you will notice a simple pattern. You can then prove that this pattern hols for all $n\ge 1$, by induction on $n$.

Then, once you know the full probability distribution of both boxes, you can derive the expected value of the smaller box. This was the idea behind this answer to the duplicate question: https://math.stackexchange.com/a/1441588/.

Related Question