Entropy and probabilistic Algorithms

entropyinformation theoryprobability

Recall entropy, from basic information theory:

The entropy of a probability distribution $D$ on a finite set $X$ is $$H(D)=\sum_{x\in X}{p(x) \cdot \log_2{\!(1/p(x))}}$$

I was able to prove that the maximum entropy of any distribution over $[n]$ is $\log{\!(n)}$ and it is achieved by the uniform distribution. Also, I was able to prove that when we group 2 parameters from $D$—​let's say, $D=\{p_1,\dots,p_n\}$ and $D'=\{p_1,\dots,p_{n-2},p_{n-1} + p_n\}$—​then $H(D')\leq H(D)$.

I need to show using the above that if I have a biased coin with probabilities $p$ and $1 − p$ of each outcome, that in order to obtain a length-$k$ sequence of unbiased coin-flips, then you need on average to use $k/H(p, 1 − p)$ tosses of
your biased coin.

My calculations are as follows:-
for the biased coin, in order to make an unbiased coin from it then I flip the coin twice, if it lands on different faces then I take the first face as the answer , otherwise, I have the same result in the 2 tosses then I toss again.
the expectation of this is $1/(2p(1-p))$
so in order to get $k$ tosses the expected value would be $k/(2p(1-p))$ and the entropy $$H(p, 1 − p) = p\cdot\log((1-p)/p) + \log(1/(1-p))$$
I can't make a connection between the entropy and my answer.

Best Answer

Let $X$ be source corresponding to the biased coin, and $Y$ the desired fair coin.

Your procedure needs $Z$ attempts, where $Z$ follows a geometrical distribution with success probability $ 2p(1-p)$ , so $E[Z]=1/(2p(1-p))$. And each attempt needs two instances of $X$. Hence, in average, to produce $k$ instances of $Y$ you need $$ \frac{k}{p(1-p)}$$ tosses of the biased coin.

You can see that your proposal is correct but highly inefficient, by looking at the special case $p=1/2$: you need on average $4k$ tosses, when actually $k$ are enough.

Imagine instead that you take the sequence of $X$ and use it as input to an ideal binary code (that it, $Y$ is the perfect binary compression of $X$). Then, you know , by the first Shannon theorem and its corolaries, that:

  • If $n$ input symbols result in $k$ coded symbols, then $k/n \to H(X)$
  • The coded symbols ($Y$) have maximal entropy, i.e., they correspond to a fair coin.

In conclusion, you need $n = k/H(X)=k/h(p)$ tosses of the biased coin.

The graph displays the ideal $n/k=1/h(p)$ (in red) vs the ratio that produces your algorithm.

enter image description here

You can devise a realizable implementation of this idea by arithmetic coding.

Related Question