Can the Fenchel conjugate be characterized through the Fenchel-Young inequality

convex optimizationconvex-analysisduality-theorems

It is known (see for example this question), that the Fenchel conjugate of a smooth function $f: \mathbb{R}^d \to \mathbb{R}$ verifies the identity
$$
f(x) + f^\star(\nabla f(x)) = \langle x, \nabla f(x)\rangle~.
$$

Is it true that this identity characterizes the Fenchel conjugate? That is, is $f^\star$ the only function that verifies the above identity?

Best Answer

In view of the question you linked in your question above (and answers thereof), we know that if $f$ is convex l.s.c, then $f(x) + f^*(u) = \langle x,u\rangle \iff u \in \partial f(x)$, where $\partial f(x) \subseteq X^*$ is the subdifferential of $f$ at $x$ defined by

$$ \partial f(x) := \{x^* \in X^* \mid f(x') \ge f(x) + \langle x^*, x'-x\rangle\;\forall x' \in X\} $$

Here $X^*$ is the topological dual of the real (or complex!) Banach space $X$, the space of all linear functionals $X \mapsto \mathbb R$. The mapping $\langle\cdot,\cdot\rangle : X^* \times X \rightarrow \mathbb R$ is the duality bracket. We consider Banach spaces and not just Hilbert spaces because we want to be as general as reasonable (think about function spaces, spaces of probability distributions, etc.). We'll assume $X$ is reflexive, which means $X^{**} = X$. A crash course on convex analysis in Banach spaces can be found here.

To answer the first part of your question in the affirmative, it then suffices to prove that

Lemma. If $f$ is convex and differentiable at $x$, then $\partial f(x) = \{\nabla f(x)\}$, i.e $\nabla f(x)$ is the unique subgradient of $f$ at $x$.

Proof. See https://math.stackexchange.com/a/1989140/168758. $\quad\quad\Box$

This proves that

The choice $g=f^*$ solves the functional equation $$ \begin{split} &f(x) + g^\star(\nabla f(x)) = \langle x, \nabla f(x)\rangle\;\forall x \in \text{intdom}\;f,\\ &g:\text{intdom}\;f^* \rightarrow X, \end{split} \tag{FY} $$

where $\text{intdom}\;f$ is the interior of $\text{dom}\;f := \{x \in X \mid f(x) < +\infty\}$.

On the uniqueness of $f^*$ as a solution of (FY)

Suppose that $f$ is a Legendre function, meaning that

  • $f$ is convex l.s.c and continuously differentiable on $\text{intdom}\; f$, and
  • the gradient-mapping $\nabla f: \text{intdom}\;f \rightarrow X^*$ is one-to-one (a sufficient condition is that this $f$ is essentially strictly convex; see Fact 2.4 and Corollary 2.6 of https://people.ok.ubc.ca/bauschke/Research/07.pdf).

Then $f^*$ is also a Legendre function and one has the following conjugate-inversion formulae (see Rockafellar's Convex Analysis, Theorem 26.5 for proof!)

$\nabla f^*\nabla f=\text{ the identity operator on }\text{int}\text{dom}\;f$,
$\nabla f\nabla f^* = \text{ the identity operator on }\text{int}\text{dom}\;f^*$.

Now, back to our problem. Suppose $g:\text{int}\;\text{dom}\;f^* \rightarrow X$ is another function which satisfies (FY), and let $u \in \text{int}\;\text{dom}\;f^*$. Then by the above conjugate-inversion formula, we have $u=\nabla f(x_u)$ where $x_u = \nabla f^*(u) \in \text{intdom}\;f$. Thus

$$ g(u) = g(\nabla f (x_u)) = \langle x_u ,\nabla f(x_u)\rangle - f(x_u) = f^*(\nabla f(x_u)) = f^*(u), $$

where the 2nd and 3rd equalities are due to the fact that $g$ and $f^*$ solve (FY).

The above equation shows that $g=f^*$ on $\text{intdom}f^*$, thus proving the unicity of $f^*$ as solution to the functional equation (FY). $\quad\quad\Box$

We have the hammer, where are the nails ?

Note that a by-product of our above analysis is that

The conjugate $f^*$ can be explicity computed like so $$ f^*(u) = \langle (\nabla f)^{-1}(u),u\rangle - f((\nabla f)^{-1}(u)), \;\forall u \in \text{intdom} f^*. \tag{1} $$

As an example, let's prove the popular statement

"Log-sum-exp" is the convex conjugate of "relative entropy".

More precisely, let $X$ be the real Banach space of Radon measures on a topological space $\Omega$ (e.g a finite set), so that $X^*$ is the space of continuous functions on $\Omega$ equipped with the sup-norm, and the duality bracket is given for every $(v,\nu) \in X^* \times X$ by $\langle \nu, v\rangle = \mathbb E_\nu[v] := \int_\Omega vd\nu$. Now, fix a base measure $\mu$ on $\Omega$, and consider the so-called relative entropy functional (relative to the base measure $\mu$)

$$ \begin{split} H(\cdot\|\mu): X &\rightarrow (-\infty,+\infty],\\ \nu &\mapsto H(\nu \| \mu) := \begin{cases}\mathbb E_{\nu}\left[\log\left(\frac{d\nu}{d\mu}\right)\right] = \int_{\Omega}\log\left(\frac{d\nu}{d\mu}\right)d\nu,&\mbox{ if }\nu \ll \mu,\\+\infty,&\mbox{ else.}\end{cases} \end{split} $$ Here, $\dfrac{d\nu}{d\mu}$ is the Radon-Nikodym derivative of $\nu$ w.r.t $\mu$. One can easily check that this is a Legendre function. Now, let $\nu$ be a probability distribution on $\Omega$ with $\nu \ll \mu$, and let $u:\Omega \rightarrow \mathbb R$ be a continuous function. The following computations are trivial

  • $\nabla f(\nu) = \log\left(\frac{d\nu}{d\mu}\right) + 1$ (pointwise logarithm), and so $\nabla f(\nu) = u$ iff $d\nu = (e^{u}/Z)d\mu$, where $Z_u = \mathbb E_\mu[e^u]$.
  • Thus $(\nabla H(\cdot\|\mu))^{-1}(u) \ll \mu$ with Radon-Nikodym derivative equal to $e^u/Z_u$.

Plugging into formula (1) with $f=H(\cdot\|\mu)$, we get

$$ \begin{split} H(\cdot\|\mu)^*(u) &= \langle (\nabla H(\cdot\|\mu))^{-1}(u),u\rangle - f((\nabla H(\cdot\|\mu))^{-1}(u))\\ &= \int_\Omega (ue^u/Z_u) d\mu - \int_\Omega \left(e^u/Z_u\right)\log\left(e^u/Z_u\right)d\mu\\ &= \mathbb E_{\mu}[ue^u/Z_u] - \mathbb E_{\mu}[ue^u/Z_u] + \log(Z_u) = \log(Z_u) = \log(\mathbb E_\mu[e^u]). \end{split} $$

How cool is that ?

Related Question