Characterization of Lipschitz derivative

calculuslipschitz-functionsmultivariable-calculus

Let $f \colon O \to \mathbb R$ be continuously differentiable, where $O \subset \mathbb R^n$ is open and convex. Then, the following two assertions are equivalent:

  1. $\nabla f$ is Lipschitz-continuous with constant $1$, i.e.,
    $$ \| \nabla f(y) – \nabla f(x) \| \le \| y – x \| \qquad \forall x,y \in O.$$
  2. we have
    $$ \bigl| (\nabla f(y) – \nabla f(x) )^\top (y – x) \bigr| \le \|y – x\|^2 \qquad\forall x,y \in O.$$

I am looking for a concise proof or a reference for this equivalence.

The direction $1 \Rightarrow 2$ is just an application of the Cauchy-Schwarz inequality.

The other direction is slightly harder. Let me outline a proof. If we assume additionally $f \in C^2$, then by using $y = x + t h$, with fixed $h \in \mathbb R^n$ in 2, we obtain
$$ |h^\top \nabla^2f(x) h| \le \| h \|^2 \qquad\forall x \in O, h \in \mathbb R^n$$
by passing to the limit $t \to 0$.
Since $\nabla^2 f(x)$ is symmetric, this yields $\| \nabla^2 f(x) \| \le 1$ and the Lipschitz continuity of $\nabla f$ follows.
In the general case, one can apply this argument to the mollification $f_\epsilon = f \mathbin\star \psi_\epsilon$, where $\psi_\epsilon$ is a standard mollifier.

Is it possible give a direct proof of $2 \Rightarrow 1$ without the 'detour' to $C^2$-functions?

By the way: Although the statements 1 and 2 only depend on $\nabla f$, the equivalence no longer holds if $\nabla f$ is replaced by an arbitrary, continuous function $g \colon O \to \mathbb R^n$: Use, e.g., $g(x) = A x$, where $A \in \mathbb R^{n \times n}$ is skew symmetric.

Best Answer

By the fundamental theorem $$ f(y) - f(x) - \nabla f(x)^T(y-x) = \int_0^1 (\nabla f(x+t(y-x))-\nabla f(x))^T(y-x) dt. $$ Let me first prove the claim for $O=\mathbb R^n$. Using (2) this implies $$ |f(y) - f(x) - \nabla f(x)^T(y-x)| \le \frac12 \|y-x\|^2. $$ Now take $d\in \mathbb R^n$. Then the inequality above implies $$ f(y+d) - f(x) - \nabla f(x)^T(y+d-x) \le \frac12 \|y+d-x\|^2\\ f(x-d) - f(y) - \nabla f(y)^T(x-d-y) \le \frac12 \|y+d-x\|^2\\ -(f(y+d) - f(y) - \nabla f(y)^Td) \le \frac 12\|d\|^2\\ -(f(x-d) - f(x) - \nabla f(x)^T(-d)) \le \frac 12\|d\|^2. $$ Addding all these inequalities results in $$ (\nabla f(x) - \nabla f(y))^T(x-y-2d) \le \|y+d-x\|^2 + \|d\|^2. $$ Let me set $g:=\nabla f(x) - \nabla f(y)$ and choose $d$ such that $x-y-2d=g$. Then $d = \frac12(x-y-g)$ and $y+d-x = -\frac12(x-y+g)$. This results in $$ \|g\|^2 \le \frac14\|x-y+g\|^2 + \frac14\| x-y-g\|^2 =\frac12 \|x-y\|^2 + \frac12\|g\|^2, $$ which implies (1).


Let now $x,y\in O\ne \mathbb R^n$. Then $B_\rho(x),B_\rho(y)\subset O$ for some $\rho>0$. In addition, $B_\rho(\lambda x+(1-\lambda y))\subset O$ for all $\lambda \in (0,1)$. Now take $\tilde x,\tilde y$ on the line between $x$ and $y$ such that $\|\tilde x-\tilde y\|<\rho$. Take $d$ such that $\tilde y+d,\tilde x-d\in O$.

As above, we get $$ (\nabla f(\tilde x) - \nabla f(\tilde y))^T(\tilde x-\tilde y-2d) \le \|\tilde y+d-\tilde x\|^2 + \|d\|^2. $$ Denote $\tilde g:=\nabla f(\tilde x) - \nabla f(\tilde y)$. Now I would like to set $d:=\frac12(\tilde x-\tilde y - t\tilde g)$ for some $t>0$ small. Note $$ \tilde y+d = \frac12(\tilde x+\tilde y) -\frac t2 \tilde g,\quad \tilde x-d = \frac12(\tilde x+\tilde y) +\frac t2 \tilde g. $$ Hence it is enough to choose $t := \min(1,\frac\rho{\|\tilde g\|})$. Putting the choice of $d$ in above inequality results in $$ t \|\tilde g\|^2 \le \frac14\|\tilde x-\tilde y+\tilde g\|^2 + \frac14\| \tilde x-\tilde y-\tilde g\|^2 =\frac12 \|\tilde x-\tilde y\|^2 + \frac{t^2}2\|\tilde g\|^2. $$ If $t=\frac\rho{\|\tilde g\|}$ then $$ \rho \|\tilde g\| \le \frac12 \|\tilde x-\tilde y\|^2 + \frac{\rho^2}2 < \rho^2. $$ Then $\|\tilde g\|<\rho$, and $t=1<\frac\rho{\|\tilde g\|}$, a contradiction. Hence $t$ is equal to $1$, and $$ \|\nabla f(\tilde x) - \nabla f(\tilde y)\| \le \|\tilde x-\tilde y\| $$ for all $\tilde x,\tilde y$ on the line between $x$ and $y$ such that $\|\tilde x-\tilde y\|< \rho $.

Now let $n>\frac{1}\rho\|x-y\|$. Divide the line between $x$ and $y$ into $n$ parts of equal length. Set $x_i:=x+\frac in(x-y)$, $i=0\dots n$. Then $\|x_i-x_{i+1}\|<\rho$ and hence $\|\nabla f(x_i) - \nabla f(x_{i+1})\| \le \|x_i-x_{i+1}\|$. Then $$ \|\nabla f(x)-\nabla f(y) \le \sum_{i=0}^{n-1} \|\nabla f(x_i) - \nabla f(x_{i+1})\| \\ \le \sum_{i=0}^{n-1}\|x_i-x_{i+1}\| \\ = \|x-y\|, $$ which finishes the proof.

Related Question