[Math] How Entropy scales with sample size

discrete mathematicsentropyprobability distributions

For a discrete probability distribution, the entropy is defined as:
$$H(p) = \sum_i p(x_i) \log(p(x_i))$$
I'm trying to use the entropy as a measure of how "flat / noisy" vs. "peaked" a distribution is, where smaller entropy corresponds to more "peakedness". I want to use a cutoff threshold to decide which distributions are "peaked" and which are "flat". The problem with this approach is that for "same shaped" distributions, the entropy is different for different sample sizes! as a simple example take the uniform distribution – it's entropy is:
$$p_i = \frac{1}{n}\ \ \to \ \ H = \log n$$
To make things worse, there doesn't seem to be a general rule for more complex distributions.

So, the question is:

How should I normalize the entropy so that I get the same "scaled entropy" for "same" distributions irrespective of the sample size?

Best Answer

Use the normalized entropy:

$$H_n(p) = -\sum_i \frac{p_i \log_b p_i}{\log_b n}.$$

For a vector $p_i = \frac{1}{n}\ \ \forall \ \ i = 1,...,n$ and $n>1$, the Shannon entropy is maximized. Normalizing the entropy by $\log_b n$ gives $H_n(p) \in [0, 1]$. You will see that this is simply a change of base, so one may drop the normalization term and set $b = n$. You can read more about normalized entropy here and here.

Related Question