Understanding Claude Shannon’s Formula for Information and Entropy

information theory

I'm trying to understand Shannon's formula for information as entropy in some detail. (I'm obviously not a trained mathematician, so please bear with me).

Here's what I understand and then following are two questions.

Suppose I have a message composed of "a" where "a" is a symbol in a set X of 32 symbols. My understanding is that this has an entropy of 0.15625 = (1/32 * log2 (32))

And correspondingly if change the the number of symbols in set X the entropy changes.

where X = 16, H = 0.25 = 1/16 * log2(16)

where X = 8, H = 0.375 = 1/8 * log2(8)

where X = 4, H = 0.5 = 1/4 * log2(4)

But here's my first question,
I would expect that
where X = 2, H should equal 1
but where X = 2, H actually equals 0.5 = 1/2 * log2(2)

My second question. I'm trying to explain to other non-mathematicians what "H" means. After some reading, I'm under the impression that Entropy or H should correlate with the amount of uncertainty in a message, or the number of yes or no questions it should on average take to, guess the message.

(Would this be: 1/H = number of guesses needed?)

Hence my confusion in question number. In a set of only two members, "ab", it should only take one guess to identify the message, but the result of H as still .5, means that 1/H (1/.5) = 2, suggesting I still need two guesses.

However, in the case of a set X=4, "abcd" this seems to work out. In the set composed of "abcd", I need two guesses to figure out the correct letter. For example: Guess 1: is it a or b, if yes, Guess 2: is it a, if yes, I found it, if not, I know it is b.

But this doesn't seem to work out so well in a set of 8 or 16 or 32.
In the case of a set of 8 "abcdefgh" 1/H would be 1/.375 = 2.66.

The simply log2 of the size of set X, seems to give me what I would have expected H to be, set X=8, log2(8)=3, therefore it takes three guess.

So perhaps I'm not accurately understanding what H is supposed to represent or perhaps I'm doing something wrong in the calculation of H.

Any thoughts, pointers, or tips would be much appreciated.

Best Answer

  1. A symbol doesn't have entropy. What has entropy is a one-letter string, because it may have different outcomes, thus the room for entropy.

For you coin case, it goes as follows: you have two outcomes with probability $p=0.5$, then on coin toss has entropy of: $$ H=-\sum_{i=1}^2p_i\log_2p_i = -2\times(0.5\log_2 0.5) = 1\textrm{ bit}. $$

  1. You can explain the informational entropy as minimal average information needed to transmit your data.

For example, we are monitoring a coin game where two players toss a coin and who gets the heads wins. If players both toss heads or tails, it's a draw. We want to transmit to a friend the result of the game (draw, player 1 wins or player 2 wins).

One approach is to just transmit what both players toss: $00$ — both tails, $01$ — player 1 tails, player 2 heads. It takes 2 bits every time, so 2 bits on average.

But we can do better. Since we are not interested in how the draw was achieved, only in the fact of a draw, we can send with the first bit whether it was the draw or not, and if it wasn't we can send with the next bit who wins:

0 — draw

10 — player 1 wins

11 — player 2 wins

Since draw happens half of the time, we will send only one bit half of the time, and two bits otherwise, totaling in average 1.5 bits.

Notably this is the best way to do it, as informational entropy gives us for 3 outcomes with probabilities: 0.25, 0.5, 0.25 the following: $$ H=-0.25\log_20.25-0.25\log_20.25-0.5\log_20.5 = 1.5 $$

So no matter what, we cannot create a better way to encode the data to achieve fewer bits per message on average.

Related Question