Fair results from a biased coin: most efficient solution

probabilitypuzzlerecurrence-relations

A well-known procedure for simulating a fair coin from a biased one (attributed to Von Neumann) is to toss it twice: if the results differ then use the last result, otherwise start over. If the biased coin had a heads probability of $p$, then the expected number of tosses for this is $\frac{1}{p(1-p)}$. If, unbeknownst to us, the coin was in fact unbiased ($p = 0.5$), this evaulates to 4.

We can in fact do better than this by not discarding all the previous toss results when we toss again. For example, just as "HT" and "TH" can respectively be mapped to "T" and "H", so can "HHTT" and "TTHH", or "HHHHTTTT" and "TTTTHHHH". Below is a function in Python that, given the sequence of toss results so far, returns either "H", "T" or None (to indicate that you need to continue tossing).

>>> def unbias(tosses):
  if len(tosses) % 2 == 1: return None
  elif set(tosses[-2:]) == {"H", "T"}: return tosses[-1]
  else: return unbias(tosses[::2])
>>> unbias("HT")
'T'
>>> unbias("HH")
>>> unbias("HHTH")
'H'
>>> unbias("HHTT")
'T'
>>> unbias("HHHH")
>>> unbias("HHHHTH")
'H'
>>> unbias("HHHHTT")
>>> unbias("HHHHTTTT")
'T'

My question is: is there a closed form expression for the expected number of coin tosses here? Specifically, I'm interested in the case when $p$ is 0.5. Simulation indicates that the expectation here is around 3.401.

Best Answer

If you look at the probabilities of needing more than $2n$ tosses when $p=\frac12$, this suggests you get an expectation of $$\sum_{n=0}^\infty 2^{a(n)-n}$$ where $a(n)$ is the number of $1$s in the binary representation of $n$, OEIS A000120.

This is $$2^{0-0}+2^{1-1}+2^{1-2}+2^{2-3}+2^{1-4}+2^{2-5}+2^{2-6}+2^{3-7}+2^{1-8}+\cdots \\ \,\\ \approx 3.40147099057$$ which is also $\sum\limits_{n=0}^\infty \frac{1}{b(n)}$ where $b(n)$ is the largest power of $2$ that divides $n!$, OEIS A060818

but I do not see an obvious closed form.


Investigating further, this improvement of about $0.5985$, compared with the expectation of $4$ tosses from the traditional procedure you describe, turns out to be the best improvement from the traditional expectation of $\frac{1}{p(1-p)}$.

As the coin becomes more biased, the improvement of your method reduces towards $0.5$.

Here is some R code calculating the expectation:

expectedrounds <- function(p){
  q <- 1 - p
  maxrounds <- 2 ^ ceiling(log(40 / -log(max(p,q), 2), 2))
  unresolvedH <- 0
  probresolved <- numeric(maxrounds)
  for (round in 1:maxrounds){
    potential <- c(unresolvedH, unresolvedH + 1)
    resolvedH <- potential[duplicated(potential)]
    probresolved[round] <- sum(2 * p^resolvedH * q^(round -  resolvedH))
    unresolvedH <- potential[! potential %in% resolvedH] 
    }
  expectedrounds <- sum(probresolved * (1:maxrounds)) + 
      sum(p^unresolvedH * q^(round -  unresolvedH)) * (maxrounds + 1/p/q)
  return(expectedrounds) 
  }

giving for example

> expectedrounds(0.5)
[1] 3.401471

The number of expected rounds for some different probabilities of heads are:

probability Heads   expectedrounds  traditional  improvement
0.5                    3.4014710      4.0000000   0.5985290
0.49  or 0.51          3.4031325      4.0016006   0.5984681
0.4   or 0.6           3.5741042      4.1666667   0.5925625
0.25  or 0.75          4.7684595      5.3333333   0.5648738
0.1   or 0.9          10.5851114     11.1111111   0.5259997
0.01  or 0.99        100.5075887    101.0101010   0.5025123
0.001 or 0.999      1000.5007509   1001.0010010   0.5002501
Related Question