[Math] How is the formula of Shannon Entropy derived

entropyinformation theory

From this slide, it's said that the smallest possible number of bits per symbol is as the Shannon Entropy formula defined:
enter image description here

I've read this post, and still not quite understand how is this formula derived from the perspective of encoding with bits.

I'd like to get some tips like in this post, and please don't tell me that it's just because this is the only formula which satisfies the properties of a entropy function.

Thx in advance~

Best Answer

Suggest you read the proof that H is the only measure (up to a constant) that satisfies the axioms of information measure. It can be found here: "The Mathematical Theory of Communication" - Shannon & Weaver.

The proof of the theorem is easy to understand - only a couple of pages.

Related Question