[Math] Shannon entropy of a fair dice

decision treesentropyinformation theoryprobability theory

The formula for Shannon entropy is as follows,

$$\text{Entropy}(S) = – \sum_i p_i \log_2 p_i
$$

Thus, a fair six sided dice should have the entropy,

$$- \sum_{i=1}^6 \dfrac{1}{6} \log_2 \dfrac{1}{6} = \log_2 (6) = 2.5849…$$

However, the entropy should also correspond to the average number of questions you have to ask in order to know the outcome (as exampled in this guide under the headline Information Theory).

Now, trying to construct a decision tree to describe the average number of questions we have to ask in order to know the outcome of a dice, and this seems to be the optimal one:

Decision tree for dice

Looking at the average number of questions in the image, there are 3 questions in 4/6 cases in 2 questions in 2/6 cases. Thus the entropy should be:

$$\dfrac{4}{6} \times 3 + \dfrac{2}{6} \times 2 = 2.6666…$$

So, obviously the result for the entropy isn't the same in the two calculations. How come?

Best Answer

To recover entropy, you have to consider a sequence of dice throws, and ask how many questions per roll you need in an optimal strategy, in the limit that the number of rolls goes to infinity. Note that each question can cover all the rolls, for example for two rolls, you could ask at some point: “Are the results in $\{16,21,22,23\}$?” (where the first digit denotes the first throw, and the second digit denotes the second throw).

I'm too lazy to do it for 36 possibilities, therefore here a simpler example: Consider a die for which each roll gives only one of three results with equal probability. Then the entropy is about $1.58496$.

For one toss, the optimal strategy is simply to ask “was it $1$?” followed by ”was it $2$?”, which on average gives $5/3 = 1.66$ questions.

For two tosses, an optimal strategy would be to first ask “was it one of $\{11,12,13,21\}$?” (where the first digit gives the result of the first toss, and the second digit the result of the second toss). If the answer is “yes”, then use two questions to single out one of the four results. Otherwise, ask “was the first toss a $2$?”, if yes then it was one of $22$ or $23$, and one question is sufficient to determine that. In the remaining case you know the first toss was $3$ and know nothing about the second, so you employ the one-toss strategy to determine the second toss.

This strategy needs on average $29/9=3.2222$ questions, or $1.61111$ questions per toss. Which is already much better, and indeed only $1.65\,\%$ worse that the value given by the entropy.

Note that the average number of questions of the single-toss optimal strategy can differ dramatically from the entropy. For this, consider the toss of a biased coin. The entropy of this can be made arbitrary low by making the coin sufficiently biased. But obviously there's no way you can get the result of a coin toss with less than one question.

Related Question