Lipschitz implies bounded gradient with any norm

derivativeslipschitz-functionsreal-analysis

Assume I have a differentiable function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ that is $L$-Lipschitz with respect to a norm $\|\cdot\|$ on $\mathbb{R}^n$:
$$ \forall x, y \in \mathbb{R}^n, |f(x) – f(y)| \le L\|x-y\|. $$
I know that if $f$ is further assumed to be convex, and $\|\cdot\|$ is the euclidian norm, then convexity implies
$$ L\|\nabla f(x)\|_2 \ge f(x + \nabla f(x)) – f(x) \ge \langle \nabla f(x), \nabla f(x)\rangle = \| \nabla f(x)\|_2^2, $$
thus $\|\nabla f(x)\|_2 \le L$.

But does $\|\nabla f(x)\| \le L$ still hold if $\|\cdot\|$ is any norm on $\mathbb{R}^n$? And if $f$ is only assumed differentiable?

Best Answer

Recall that for a differentiable function $f : \mathbb{R}^n \to \mathbb{R}$, its gradient is defined thanks to the euclidean structure of $\mathbb{R}^n$: it is the only vector field such that for all $x$ and $u$ $$ \mathrm{d}_xf\cdot u = \langle \nabla f(x),u\rangle. $$

In what follows, all gradients $\nabla f$ are the classic euclidean gradients defined thanks to the usual euclidean structure.

An important result is that if $f$ is Lipschitz for some norm, then it is almost everywhere differentiable, and is Lipschitz for any norm. Moreover, the euclidean gradient is then defined almost everywhere.

Notice that for the euclidean norm, one has $\|\nabla f(x)\|_2 = \|\mathrm{d}_xf\|_2$, where this second norm is the operator norm subordinate to the euclidean one: it is because by Cauchy-Schwarz inequality: $$ |\mathrm{d}_xf\cdot u| = |\langle \nabla f (x),u\rangle| \leqslant \|\nabla f (x) \|_2 \|u\|_2, $$ hence $\|\mathrm{d}_xf\| \leqslant \|\nabla f(x)\|_2$, and equality is reached for $u = \nabla f(x)$. So saying $f$ is $L$-Lipschitz show that $\|\nabla f\|_2 \leqslant L$ almost everywhere.

But if you consider another norm on $\mathbb{R}^n$, there is no reason for (the above vector field) $\nabla f$ to stay bounded with the same constant: indeed, the equivalence of norms in a finite dimensional vector space assures that for any norm $\|\cdot\|$, there exists a constant $C>0$ such that $$ \|\nabla f(x)\| \leqslant C\cdot \|\nabla f (x) \|_2. $$ A counter example to having the same constant $L$ is just the following: suppose $f(x) = \langle u,x \rangle$ for a given $u \in \mathbb{R}$ and for $\langle\cdot,\cdot\rangle$ the usual inner product. Then $\nabla f \equiv u$ on $\mathbb{R}^n$, hence $\|\nabla f \|_2 = \|u\|_2$. But if one measures $\nabla f$ thanks to $\|\cdot\|_{\lambda}=\lambda \|\cdot \|_2$ with $\lambda >0$, one has $$ \|\nabla f \|_{\lambda} = \lambda \|u\|_2 $$ which is strictly greater than $L = \|u\|_2$ if $\lambda >1$.

To conclude, note that that I never used any convexity assumption, but I used only the Lipschitz one.