Log-sum-exp is the conjugate of relative entropy: a quick proof

convex-analysisentropylegendre-transformationmeasure-theoryprobability theory

Let $(\Omega, \mathcal A, dm)$ be a measure space. For measurable functions $f\colon \Omega\to \mathbb R$ such that $\exp(f)$ is integrable, and for $\rho>0$ such that $\int \rho\, dm=1$, we define
$$
L(f)=\log\int_\Omega e^{f}\, dm,\quad \text{and}\ \quad H(\rho)=\int_{\Omega} \rho\log\rho\, dm. $$

These two functionals satisfy the Legendre-Fenchel inequality
$$\tag{1}
\int_\Omega f\rho\, dm\le L(f)+H(\rho), $$

with equality if $e^f=C\rho$ for some $C>0$.

Question. Is there a direct proof of (1) based on the Jensen inequality? In the first page of this paper, J.Lehec suggest that this is "easily seen" to be the case.


This answer seems to be strictly related, and it does contain a proof of (1). However, I wonder if there is a "one-line" proof of (1), one that does not need to introduce a lot of definitions.


I can prove (1) by means of the variational calculus. Namely, I can prove that
$$\tag{2}
H(\rho)=\max_{\exp(f)\in L^1} \left( \int_{\Omega} f\rho\, dm – L(f)\right).$$

(This is essentially the same computation that I find, for example, in this question, concerning the finite-dimensional case).

For a bounded $g$, we let $f=f_\star+\epsilon g$ in the term in brackets in (2), noting that $\exp(f_\star +\epsilon g)\in L^1$ if $\exp(f_\star)\in L^1$ (that is, $g$ is an admissible variation). We differentiate in $\epsilon$, then set $\epsilon=0$, inferring that $f_\star$ is a critical point if and only if
$$
\int_\Omega g\rho\, dm = \frac{\int_\Omega e^{f_\star} g\,dm}{\int_\Omega e^{f_\star}\, dm}, \quad \forall g\in L^\infty(\Omega).$$

We immediately see that $f_\star=\log \rho$ is a critical point (not unique), and since $L$ is convex, the functional to maximize in (2) is concave, hence all critical points are maximizers.$^{[1]}$ Thus, plugging $f=f_\star$ in (2) we conclude the proof.


[1] This might need to restrict the domain of $L$, since $\{f\,:\,\exp(f)\in L^1\}$ is not a vector space. EDIT: As Olivier Diaz points out in the comments, this domain is a convex set. This is a consequence of the Hölder inequality. Therefore, it is true that critical points are maximizers, as stated above.

Best Answer

Note that $\rho \, \mathrm{d}m$ defines a probability measure on $\Omega$. So by the Jensen's inequality,

$$ \exp\left(\int_{\Omega} (f - \log\rho) \, \rho \, \mathrm{d}m \right) \leq \int_{\Omega} \exp(f - \log\rho) \, \rho \, \mathrm{d}m = \int_{\Omega} e^{f} \, \mathrm{d}m. $$

This is equivalent to the inequality in question.

Moreover, since $\exp(\cdot)$ is strictly convex, the equality holds precisely when $f - \log\rho$ is constant a.s. with respect to $\rho \, \mathrm{d}m$, which is equivalent to $e^f = C\rho$ a.e. with respect to $m$ for some constant $C > 0$.