Unifying discrete and differential entropy with measure theory

entropyinformation theorymeasure-theoryprobability theory

Let say we have a random variable $X$ with distribution $\mathbb{P}_X$.
I would like to have a unique definition of entropy for discrete and random variable.
According to this article of Wikipedia, https://en.wikipedia.org/wiki/Information_theory_and_measure_theory, I could define the entropy of $X$ relatively to a measure $\rho$ as
$$ H_\rho(X) = – \mathbb{E}_{\mathbb{P}_X}\left[\log \frac{d \mathbb{P}_X}{d \rho}\right]$$
where $\rho$ is a measure on $Val(X)$, which could be either discrete or continuous,
and $\frac{d \mathbb{P}_X}{d \rho}$ is the Radon-Nikodym derivative of $\mathbb{P}_X$ with respect to the measure $\rho$.

Then using either the counting measure in the discrete case or the Lebesgue measure in the continuous one, I can recover the definitions of Shannon and relative entropy. Am I correct ?

If yes then I have a problem because I could use the relative entropy between two measures
$\mu$ and $\nu$:
$$ D(\mu || \nu) = \mathbb{E}_{\mu} \left[\log\frac{d \mu}{d \nu}\right] $$
to define the entropy:
$$H_\rho(X) = – D(\mathbb{P}_X || \rho)$$
But we know from Jensen's inequality that $D(\mu||\nu) \geq 0$ which would mean that $H_\rho(X) \leq 0$.
There must be a mistake somewhere or something I'm missing but I can't find it…

PS : I know there is already a thread about this subject (Is there a unified definition of entropy for arbitrary random variables?) but it uses the definition of relative entropy from Gray which, from what I understand, is not exactly what I want.

Best Answer

Do note that Jensen's inequality only works when your reference measure is a probability measure, and the proof of why KL is $\geq 0$ needs both measures to be probability measures. Otherwise, yes, KL divergence can be defined using Radon-Nikodym derivatives, as you outline.

Related Question