[Math] Strong convexity of squared $\ell_p$ norm in Bregman divergence

convex optimizationconvex-analysislinear algebralp-spacesoptimization

I was reading a tutorial on Bregman divergence and Mirror Descent
https://www2.cs.uic.edu/~zhangx/teaching/bregman.pdf
and have a question on the strong convexity of the squared $\ell_p$ norm.

In particular, for finite-dimensional vector $x\in\mathbb R^d$ consider $1\leq p<\infty$ and define $\|x\|_p := (\sum_{i=1}^d{|x_i|^p})^{1/p}$. Define $\psi_p(x) := \frac{1}{2}\|x\|_p^2$ and the corresponding Bregman divergence
$$
\Delta_p(x,y) := \psi_p(x)-\psi_p(y)-\langle \nabla\psi_p(y), x-y\rangle.
$$
Note that for $p=1$ the potential $\psi_p(x)$ is not differentiable, and sub-differentials should be considered (e.g., $\nabla\psi_p(y) = \mathrm{sgn}(y)$.

Question: Does there exist a positive number $\sigma>0$, such that $\Delta_p(x,y)\geq \frac{\sigma}{2}\|x-y\|_p^2$ for all $x$ and $y$?

The inequality clearly holds for $p=2$. In fact for my purpose I only need to establish the above for the $p=1$ case, but it will be nice to ask for the general $\ell_p$ case.

Best Answer

I see this question is very old, but hopefully someone else finds it and can benefit from an answer.

Answer: No.

First, this clever solution demonstrates the failure when $p=1$ or $\infty$ which was your original interest.

Strict/Strong convexity of non-Euclidean norm

Thank you @daw for pointing out an error in my original computation. The answer is still "no" when $p > 2$ and $n \ge 2$. In a new comment after the (fixed) version of my original answer, I explain why the answer is "yes" when $n=1$, with any norm in place of $\| \cdot \|_{p}$.

Let $F(x) = \frac{1}{2} \|x\|_{p}^{2}$. Assume $2 < p < \infty$ so that $\| \cdot \|_{p}$ (and consequently $F$) is $C^{2}$.

In particular, if $x$ is near enough to $x_{0}$, $$ F(x) = F(x_{0}) + DF(x_{0}) \cdot (x-x_{0}) + \frac{1}{2}(x-x_{0})^{\perp} D^{2}F(x_{0}) \cdot (x-x_{0}) + o(|x-x_{0}|^{2}). $$

So, we compute $$\partial_{i}F(x) = \|x\|_{p}^{2-p} x_{i} |x_{i}|^{p-2}$$ and $$\partial_{ij}F(x) = \delta^{i}_{j} (p-1) \left|\frac{x_{i}}{\|x\|_{p}} \right|^{p-2} - (p-2) \frac{x_{i} |x_{i}|^{p-2}}{\|x\|_{p}^{p-1}} \frac{x_{j} |x_{j}|^{p-2}}{\|x\|_{p}^{p-1}}.$$

Let $e_{1}$ denote the first standard basis vector. Then, $$ \partial_{ij}F(e_{1}) = \begin{cases} 1 & (i,j) = (1,1) \\ 0 & else \end{cases}. $$ In particular, if $n = 2$ (or if $n$ is larger, pad all matrices/vectors below with zeroes) when $x = e_{1} + t e_{2}$ we have $$ (x-x_{0})^{\perp} (D^{2}F)(e_{1})(x-x_{0}) = \begin{pmatrix} 0 & t \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} 0 \\ t \end{pmatrix} = 0. $$ So, using the Taylor series approximation for $t$ small to bound $\Delta_{p}(e_{1}+ te_{2}, e_{1})$, we discover $$ |\Delta_{p}(e_{1}+ t e_{2},e_{1})| \le o(t^{2}). $$

It might be interesting to convert this into the language of convex functions and see if a similar statement holds for $1 < p < 2$.

Added when corrections were done: Note that if $f \in C^{2}(\mathbb{R}^{n} \setminus \{0\})$ is any norm and $F(x) = \frac{1}{2} f(x)^{2}$, then $DF(x) = f(x) Df(x)$ and $D^{2}f(x) = f(x) D^{2}f(x) + Df(x) (Df(x))^{\perp}$.

Since $f$ is one-homogeneous, $Df(x)$ is non-zero (indeed, $Df(x) \cdot x = f(x) \neq 0$ due to non-degeneracy of norms) and $Df(x) ( Df(x))^{\perp}$ is therefore a rank-one, positive semidefinite matrix.

Since $f$ is convex, $f(x) D^{2}f(x)$ is positive semidefinite. In particular, the sum, $D^{2}F(x) = f(x) D^{2}f(x) + Df(x) (Df(x))^{\perp}$ has all non-negative eigenvalues and at least one that is positive. This and the above Taylor series expansion shows why there at least exists $c_{x}$ so that $\Delta_{p}(x,y) \ge c_{x} \|x-y\|^{2}$ for$\|x-y\|$ small when $n=1$, and in fact this holds for any norm in place of $\| \cdot \|_{p}$.