Probability – Expected Number of Single Socks When Matching Socks

probability

Whenever I go through the big pile of socks that just went through the laundry, and have to find the matching pairs, I usually do this like I am a simple automaton:

I randomly pick a sock, and see if it matches any of the single socks I picked out earlier and that haven't found a match yet. If there is a match, I will fold the two socks together and put them in the 'done' pile, otherwise I will add the single sock to the 'no match yet' pile of single socks, and pick out another random sock.

So, as I was doing this last night, I started thinking about this, and figured that the following would be true: The 'no match yet' pile can be expected to slowly grow, up to some point somewhere in the 'middle' of the process, after which the pile will gradually shrink, and eventually go down back to $0$. In fact, my intuition is that the expected number of loose socks as a function of the number of socks picked so far, is a symmetric function, with the maximum being when I have picked half of the socks.

So, my questions are:

With $n$ pairs of socks, what is the expected number of loose socks that are in my 'no match yet' pile after having picked $k$ socks?

Is it true that this function is a symmetric function, and that the maximum is for $k=n$? (if so, I figure there must be a conceptual way of looking at the problem that makes this immediately clear, without using any formulas … what is that way? Is it just that I can think of reversing the process?)

Of course, this is all assuming there are $n$ pairs of socks total, and that there are no single socks in the original pile, and while this is something that never seems to apply to the pile of socks coming through my actual laundry, let's assume for the sake of mathematical simplicity that there really just are $n$ pairs of socks.

Best Answer

The expected number can be computed via Linearity of Expectation. Let $E[n,k]$ denote the answer and let $\{X_i\}_{i=1}^n$ denote the indicator variable for the $i^{th}$ pair. Thus $X_i=1$ if exactly one member of the $i^{th}$ pair has been chosen in your $k$ trials, and $X_i=0$ otherwise. It is easy to see that $$E[X_i]=2\times \frac k{2n}\times \left(1-\frac {k-1}{2n-1}\right)$$ from which it follows that $$E[n,k]=E\left[\sum X_i\right] =\sum E[X_i]= k\times \left(1-\frac {k-1}{2n-1}\right)$$

Sanity check: $k=1\implies E[n,1]=1$ as it should. Also $k=2n\implies E[n,2n]=0$ as it should.

Remark: it is easily seen that this function is maximized with $k=n$, confirming your intuition. Also the expression can be written as $$E[n,k]=\frac {k(2n-k)}{2n-1}$$ which is symmetric under the exchange of $k,2n - k$ also in line with your expectations.

Remark: more strongly, it is clear that at any time the number of unmatched socks in one pile is the same as the number in the other pile (indeed it's exactly the same pairs of socks which are split between the piles). That provides clear justification for the symmetry.

Related Question