[Math] information theoretic entropy and its physical significance

I have learned entropy in my information theory classes. The definition I got from text books was the average information content in a message sequence etc. But in one of the MIT videos related to Information theory , the professor said entropy is the information which we do not have regarding the message. Is that both same? Another view point regarding entropy is the amount of disorder associated with message. My doubts are the following:

If we say that entropy of english language is 2 bits and that of hindi is 3, what does that convey?
Compressed data normally has lesser entropy. Does that mean the disorder assosiated with compressed data is less?
What is the significance of entropy related to genes (in biology) and music etc?
Lastly how the strength of a password is related to entropy?

Any help or links or references are appreciated.

VERY IMPORTANT NOTE : Answers related to my 2$^{nd}$ question is creating some amount of confusion. First of all I must have specified about compression method (lossy or lossless). So I was discussing this question with one of my friends. So his argument is this (and I am happy to accept it, because it seems to be logical than other explanations here.): Lossless compressed data and original data will have same amount of Entropy, since both have same information content. But if the compression is lossy (like JPEG ones) it will have less entropy than that of original data's entropy, because lossy compression has lost some amount of information in the process. I invite clarifications / corrections in the form of answer if anyone have different opinion or can give a better answer.

Best Answer

The entropy of a message is a measurement of how much information it carries.

One way of saying this (per your textbook) is to say that a message has high entropy if each word (message sequence) carries a lot of information. Another way of putting it is saying that if we don't get the message, we lose a lot of information; i.e., entropy is a measure of the number of different things that message could have said. All of these definitions are consistent, and in a sense, the same.

To your first question: the entropy of each letter of the English language is about two bits, as opposed to a Hindi letter which apparently contains $3$.

The question this measurement answers is essentially the following: take a random sentence in English or Hindi, and delete a random letter. On average, how many possible letters might we expect to be in that blank? In English, there are on average $2$ possibilities. In Hindi, $3$

EDIT: the simplest way to explain these measurements is that it would take, on average, $2$ yes/no questions to deduce a missing english letter and $3$ yes/no question to deduce a missing Hindi letter. On average, there are in fact twice as many Hindi letters (on "average", you'd have $2^3=8$ letters) that can fill in a randomly deleted letter in a Hindi passage as the number of English letters (on "average", you'd have $2^2=4$ letters). See also Chris's comment below for another perspective.

For a good discussion of this stuff in the context of language, I recommend taking a look at this page.

As for (2), I don't think I can answer that satisfactorily.

As for (3), there's a lot to be done along the same lines of language. Just as we measure the entropy per word, we could measure the entropy per musical phrase or per base-pair. This could give us a way of measure the importance of damaged/missing DNA, or the number of musically appealing ways to end a symphony. An interesting question to ask about music is will we ever run out? (video).

Password strength comes down to the following question: how many passwords does a hacker have to guess before he can expect to break in? This is very much answerable via entropy.

I hope that helps.

Best Answer

Related Solutions

[Math] Relationship between compression, shannon entropy and kolmogorov complexity

Related Question