I came across the definition of Legendre functions and Legendre transformations in my studies (in the sense of convex analysis) and I started searching about it.

I found a definition in Rockefellar's 1996 "Convex analysis" book. So let$\Psi$ be a proper and closed convex function, meaning $\text{dom}(\Psi) \neq \emptyset$ and $\forall \boldsymbol{x} \in\text{dom}(\Psi), \Psi(\boldsymbol{x}) >-\infty$ and $\Psi$ is semi-continuous. Let $\Theta=\text{int}(\text{dom}(\Psi))$, where $\text{int}(\text{dom}(f))$ denotes the interior domain of $f$. Then $(\Theta, \Psi)$ is said to be a Legendre-type convex function or simply Legendre if and only if:

- $\Theta \neq \emptyset$.
- $\Psi$ is strictly convex and diferentiable in $\Theta$.
- $\forall \theta_b \in bd(\Theta), \lim_{\theta\to\theta_b} || \nabla \Psi(\theta)|| = \infty $, where $\boldsymbol{\theta} \in \Theta$ and $bd$ denotes the boundary.

While the first two conditions are clear to me, the meaning of $\lim_{\theta\to\theta_b} || \nabla \Psi(\boldsymbol{\Theta})|| \to \infty $ is a bit obscure because I'm having trouble visualizing this to grasp the intuition behind it. Why is this precisely required or useful?

## Best Answer

As I understand it, the "point" of a Legendre-type function is to be a sufficiently nice smooth(ish), strictly(ish) convex function that is preserved under Fenchel conjugation. I'm really not an expert; the only application I know of such functions is to Bregman distances.

So, what does that third condition mean intuitively? It means that the gradient of the convex function becomes arbitrarily steep as you approach the boundary of the domain (if such a boundary is non-empty). Here are some illustrations:

The function $f(x, y) = -\sqrt{1 - x^2 - y^2}$, defined over the unit disc $\|(x, y)\| \le 1$, is a Legendre type function:

Observe that the function becomes steeper as you approach the boundary circle, to the point where the only hyperplane that could support the epigraph on a boundary point is vertical. The function is also smooth on the interior of its domain, as well as strictly convex (everywhere).

It's also possible that the function has an asymptote at the boundary. Consider the Legendre function $\frac{1}{1 - x^2}$ on the domain $(-1, 1)$:

Once again, the function becomes arbitrarily steep, but the epigraph is not supported by even vertical hyperplanes. And the function is strictly convex and smooth on $(-1, 1)$.

The third "case" is when there there are no boundary points. Take, for example, the hyperbola $f(x) = \sqrt{1 + x^2}$, defined on $\Bbb{R}$:

In this case, the gradient is not arbitrarily steep, indeed its norm (or absolute value in this case) is less than $1$. However, since there is no boundary points of the domain, the third condition is vacuously satisfied. It is also still strictly convex and smooth on $\Bbb{R}$, though the strict convexity becomes trickier to identify as you approach the slanting asymptotes.

To wrap it up, let's end with a non-example. On the domain $[0, \infty)$, consider the function $f(x) = (x - 1)^2$ on the domain $[0, \infty)$:

This function is smooth and strictly convex on $(0, \infty)$, but the gradient approaches $-2$ as $x \to 0$ from the right. This means that there are (infinitely many) non-vertical hyperplanes (well, lines) supporting the epigraph of $f$ at the point $(0, 1)$. This is in violation of the third condition; the function is not Legendre (despite being smooth and strictly convex).

So, that's what the third condition "looks like", if that's the kind of intuition you were looking for.

Why do we insist upon this third condition? Well, as I said previously, the Legendre-type functions are supposed to be preserved under Fenchel conjugation.

In the fourth (non-)example, the multiple hyperplanes supporting a single point on the graph imply that the Fenchel conjugate $f^*$ is not strictly convex. Recall the reciprocity relation $y \in \partial f(x) \iff x \in \partial f^*(y)$. The many non-vertical hyperplanes implies that $\partial f(0)$ is multivalued (indeed, we can show explicitly that $\partial f(0) = (-\infty, -2]$ in this particular case). This means that multiple points (i.e. the points in $\partial f(0)$) all share the same subgradient $0$. This implies that $f^*$ is constant on $(-\infty, -2]$, and no strictly convex function could be constant on a non-trivial interval. Even if the common subgradient weren't $0$, it would imply that $f^*$ is supported by a line whose slope is the given subgradient.

For a formal proof, I'll leave it to Rockafellar to explain. This answer is long enough as it is.