Exponential of Entropy – Geometric Interpretations

entropyinequalitiesit.information-theoryreference-request

Question:

Might there be a natural geometric interpretation of the exponential of entropy in Classical and Quantum Information theory? This question occurred to me recently via a geometric inequality concerning the exponential of the Shannon entropy.

Original motivation:

The weighted AM-GM inequality states that if $\{a_i\}_{i=1}^n,\{\lambda_i\}_{i=1}^n \in \mathbb{R}_+^n$ and $\sum_{i=1}^n \lambda_i = 1$, then:

\begin{equation}
\prod_{i=1}^n a_i^{\lambda_i} \leq \sum_{i=1}^n \lambda_i \cdot a_i \tag{1}
\end{equation}

As an application, we find that if $H(\vec{p})$ denotes the Shannon entropy of a discrete probability distribution $\vec{p} = \{p_i\}_{i=1}^n$ and $r_p^2 = \lVert \vec{p} \rVert^2 $ is the $l_2$ norm of $\vec{p}$ then:

\begin{equation}
e^{H(\vec{p})} \geq \frac{1}{r_p^2} \tag{2}
\end{equation}

This result follows from the observation that if $a_i = p_i$ and $\lambda_i = p_i$,

\begin{equation}
e^{-H(\vec{p})} = e^{\sum_i p_i \ln p_i} = \prod_{i=1}^n p_i^{p_i} \tag{3}
\end{equation}

\begin{equation}
\sum_{i=1}^n p_i^2 = \lVert \vec{p} \rVert^2 \tag{4}
\end{equation}

and using (1), we may deduce (2) where equality is obtained when the Shannon entropy is maximised by the uniform distribution i.e. $\forall i, p_i = \frac{1}{n}$.

A remark on appropriate geometric embeddings:

If we consider that the Shannon entropy measures the quantity of hidden information in a stochastic system at the state $\vec{p} \in [0,1]^n$, we may define the level sets $\mathcal{L}_q$ in terms of the typical probability $q \in (0,1)$:

\begin{equation}
\mathcal{L}_q = \{\vec{p} \in [0,1]^n: e^{H(\vec{p})} = e^{- \ln q} \} \tag{5}
\end{equation}

which allows us to define an equivalence relation over states $\vec{p} \in [0,1]^n$. Such a model is appropriate for events which may have $n$ distinct outcomes.

Now, we'll note that $e^{H(\vec{p})}$ has a natural interpretation as a measure of hidden information while $e^{-H(\vec{p})}$ may be interpreted as the typical probability of the state $\vec{p}$. Given (5), a natural relation between these measures may be found using the Hyperbolic identities:

\begin{equation}
\cosh^2(-\ln q) – \sinh^2(-\ln q) = 1 \tag{6}
\end{equation}

\begin{equation}
\cosh(-\ln q) – \sinh(-\ln q) = q \tag{7}
\end{equation}

where $2 \cdot \cosh(-\ln q)$ is the sum of these two measures and $2 \cdot \sinh(-\ln q)$ may be understood as their difference. This suggests that the level sets $\mathcal{L}_q$ have a natural Hyperbolic embedding in terms of Hyperbolic functions.

References:

  1. Olivier Rioul. This is IT: A Primer on Shannon’s Entropy and Information. Séminaire Poincaré. 2018.

  2. David J.C. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press 2003.

  3. John C. Baez, Tobias Fritz, Tom Leinster. A Characterization of Entropy in Terms of Information Loss. Arxiv. 2011.

Best Answer

With apologies for promoting my own work, there's a whole book on the mathematics of the exponentials of various entropies:

Tom Leinster, Entropy and Diversity: The Axiomatic Approach. Cambridge University Press, 2021.

You can download a free copy by clicking, although persons of taste will naturally want to grace their bookshelves with the bound work.

The direct answer to your literal question is that I don't know of a compelling geometric interpretation of the exponential of entropy. But the spirit of your question is more open, so I'll explain (1) a non-geometric interpretation of the exponential of entropy, and (2) a geometric interpretation of the exponential of maximum entropy.

Diversity as the exponential of entropy

As Carlo Beenakker says, the exponential of entropy (Shannon or more generally Rényi) has long been used by ecologists to quantify biological diversity. One takes a community with $n$ species and writes $\mathbf{p} = (p_1, \ldots, p_n)$ for their relative abundances, so that $\sum p_i = 1$. Then $D_q(\mathbf{p})$, the exponential of the Rényi entropy of $\mathbf{p}$ of order $q \in [0, \infty]$, is a measure of the diversity of the community, or "effective number of species" in the community.

Ecologists call $D_q$ the Hill number of order $q$, after the ecologist Mark Hill, who introduced them in 1973 (acknowledging the prior work of Rényi). There is a precise mathematical sense in which the Hill numbers are the only well-behaved measures of diversity, at least if one is modelling an ecological community in this crude way. That's Theorem 7.4.3 of my book. I won't talk about that here.

Explicitly, for $q \in [0, \infty]$ $$ D_q(\mathbf{p}) = \biggl( \sum_{i:\,p_i \neq 0} p_i^q \biggr)^{1/(1 - q)} $$ ($q \neq 1, \infty$). The two exceptional cases are defined by taking limits in $q$, which gives $$ D_1(\mathbf{p}) = \prod_{i:\, p_i \neq 0} p_i^{-p_i} $$ (the exponential of Shannon entropy) and $$ D_\infty(\mathbf{p}) = 1/\max_{i:\, p_i \neq 0} p_i. $$

Rather than picking one $q$ to work with, it's best to consider all of them. So, given an ecological community and its abundance distribution $\mathbf{p}$, we graph $D_q(\mathbf{p})$ against $q$. This is called the diversity profile of the community, and is quite informative. As Carlo says, different values of the parameter $q$ tell you different things about the community. Specifically, low values of $q$ pay close attention to rare species, and high values of $q$ ignore them.

For example, here's the diversity profile for the global community of great apes:

ape diversity profile

(from Figure 4.3 of my book). What does it tell us? At least two things:

  • The value at $q = 0$ is $8$, because there are $8$ species of great ape present on Earth. $D_0$ measures only presence or absence, so that a nearly extinct species contributes as much as a common one.

  • The graph drops very quickly to $1$ — or rather, imperceptibly more than $1$. This is because 99.9% of ape individuals are of a single species (humans, of course: we "outcompeted" the rest, to put it diplomatically). It's only the very smallest values of $q$ that are affected by extremely rare species. Non-small $q$s barely notice such rare species, so from their point of view, there is essentially only $1$ species. That's why $D_q(\mathbf{p}) \approx 1$ for most $q$.

Maximum diversity as a geometric invariant

A major drawback of the Hill numbers is that they pay no attention to how similar or dissimilar the species may be. "Diversity" should depend on the degree of variation between the species, not just their abundances. Christina Cobbold and I found a natural generalization of the Hill numbers that factors this in — similarity-sensitive diversity measures.

I won't give the definition (see that last link or Chapter 6 of the book), but mathematically, this is basically a definition of the entropy or diversity of a probability distribution on a metric space. (As before, entropy is the log of diversity.) When all the distances are $\infty$, it reduces to the Rényi entropies/Hill numbers.

And there's some serious geometric content here.

Let's think about maximum diversity. Given a list of species of known similarities to one another — or mathematically, given a metric space — one can ask what the maximum possible value of the diversity is, maximizing over all possible species distributions $\mathbf{p}$. In other words, what's the value of $$ \sup_{\mathbf{p}} D_q(\mathbf{p}), $$ where $D_q$ now denotes the similarity-sensitive (or metric-sensitive) diversity? Diversity is not usually maximized by the uniform distribution (e.g. see Example 6.3.1 in the book), so the question is not trivial.

In principle, the answer depends on $q$. But magically, it doesn't! Mark Meckes and I proved this. So $$ D_{\text{max}}(X) := \sup_{\mathbf{p}} D_q(\mathbf{p}) $$ is a well-defined real invariant of finite metric spaces $X$, independent of the choice of $q \in [0, \infty]$.

All this can be extended to compact metric spaces, as Emily Roff and I showed. So every compact metric space has a maximum diversity, which is a nonnegative real number.

What on earth is this invariant? There's a lot we don't yet know, but we do know that maximum diversity is closely related to some classical geometric invariants.

For instance, when $X \subseteq \mathbb{R}^n$ is compact, $$ \text{Vol}(X) = n! \omega_n \lim_{t \to \infty} \frac{D_{\text{max}}(tX)}{t^n}, $$ where $\omega_n$ is the volume of the unit $n$-ball and $tX$ is $X$ scaled by a factor of $t$. This is Proposition 9.7 of my paper with Roff and follows from work of Juan Antonio Barceló and Tony Carbery. In short: maximum diversity determines volume.

Another example: Mark Meckes showed that the Minkowski dimension of a compact space $X \subseteq \mathbb{R}^n$ is given by $$ \dim_{\text{Mink}}(X) = \lim_{t \to \infty} \frac{D_{\text{max}}(tX)}{\log t} $$ (Theorem 7.1 here). So, maximum diversity determines Minkowski dimension too.

There's much more to say about the geometric aspects of maximum diversity. Maximum diversity is closely related to another recent invariant of metric spaces, magnitude. Mark and I wrote a survey paper on the more geometric and analytic aspects of magnitude, and you can find more on all this in Chapter 6 of my book.

Postscript

Although diversity is closely related to entropy, the diversity viewpoint really opens up new mathematical questions that you don't see from a purely information-theoretic standpoint. The mathematics of diversity is a rich, fertile and underexplored area, waiting for mathematicians to come along and explore it.