Entropy – Dimensional Analysis and Differential Entropy

dimensional analysisentropy

Differential entropy is a form of entropy that refers to the calculation of continuous distributions. Despite the fact that differential entropy does not have the same properties as the (discrete) Shannon entropy, it is a formulation that is currently used in many scientific publications. Differential entropy is defined as follows:
\begin{equation}
\label{eq:diffentropy}
h(x)=E\{\ell(x)\}=-\int_{S} f(x) \log_{b} f(x)
\end{equation}
provided the integral exists, where $S$ is the support set of the random variable characterized by the probability density function $f(x)$.
The base of the logarithm b is normally equal to two, but when entropy is measured in nats, its value is $e$. Thinking about the dimensions of entropy, I noticed an apparent inconsistency. According to dimensional analysis and the principle of homogeneity, the argument of the logarithm must be dimensionless. If this is the case, when expectation is calculated using the integral, the product $f(x) dx$ is not dimensionless and more precisely has the size of the variable $x$. On the other hand, if we assume
that $f(x) dx$ is dimensionless, the logarithm will have an argument that is not dimensionless.
Something is wrong in both cases. One possible solution is to imagine that the differential entropy is obtained from the relative entropy (the Kullback-Leibler divergence)
\begin{equation}
D_{KL}[f||g]=\int_{S} f(x) \log_{b} \frac{f(x)}{g(x)}dx
\end{equation}
using $g(x)=1$. However, this approach is completely wrong, because if $g(x)=1$, $g(x)$ cannot be a probability density function. In fact, once a probability density function is integrated, it must be equal to one.The only other alternative, which strikes me as rather odd though, is that the principle of homogeneity does not apply to differential entropy, but I honestly do not see why.
If anyone has any ideas on how to resolve this apparent contradiction, I'd be grateful.

Best Answer

Firstly, let's recall that $f(x)$ is a probability density fubction, therefore it possesses the inverse units with respect to $x$, so that the quantity $f(x)\mathrm{d}x$ is always dimensionless and can be interpreted as a probabality after integration.

Secondly, note that the variable $x$ itself may be dimensionless; if it is indeed the case, everything is dimensionless and perfectly fine.

Now, let's assume that $x$ is not dimensionless. Then, you are right when saying that the argument inside the logarithm is not dimensionless. What to do in that case ? You can adopt two points of view, which are actually equivalent at the end.

We can introduce a normalization constant $c$ in order to "kill" the units of $f$ inside the logarithm, i.e. $\log f \rightarrow \log \frac{f}{c}$, hence $\tilde{h}(x) = h(x) + \log c$, which translates the "zero level" entropy in the end. In the same spirit, you can renormalize the function $f$ itself, i.e. $\tilde{f} = \frac{f}{c}$, hence $\tilde{h}(x) = \frac{h(x) + \log c}{c}$.

You may also choose to renormalize the variable $x$ by a change of variable, i.e $x = \tilde{x}/a$ with $\tilde{x}$ dimensionless, such that $h(x) = ah(\tilde{x})$ or the entropy functional itself, so that $\tilde{h}(x) \propto h(x)$ hasn't got units anymore.

And if you are not comfortable with a "non-dimensionless" differential entropy after these renormalizations at the end, you are free to renormalize the entropy itself, i.e. $h \rightarrow \frac{h}{\alpha}$ to make it dimensionless.

All the considered transformations are affine (with respect to $h$) and don't affect the description of the system represented by the distribution $f$, because it won't modify the extremal points of the functional and, indirectly, the "zero level" of entropy can be chosen freely. This independence with respect to affine transformations comes the fact that the entropy is meant to be differentiated, since all physical quantities are recovered through its derivatives.

On a side note. The notion of differential entropy comes from statistical physics $-$ it is basically a microscopic analog of the more common thermodynamical entropy, hence its name $-$ and is given by $$ S[p] = -k_B \int p(x) \ln p(x) \,\mathrm{d}x, $$ where $k_B$ is the Boltzmann constant, which carries the units of entropy. In the same spirit, you can always re-introduce such a constant by renormalization, as said above. It is also to be noted that the aforementioned properties of entropy under affine transformations are related to its maximization, which is itself associated to the minimization of energy.

Even if this link to physics may seem to be a coincidence, it is to be noted that the Shannon and differential entropies are closely related to the same notion in thermodynamics, since they are responsible for the heat loss in circuits for instance, that is why these fields share the same vocabulary beyond the mathematical analogy.

Best Answer

Related Solutions

Differential entropy vs Kolmogorov-Sinai “partition trick”

Differential Entropy and “Limiting density of discrete points”

Related Question