$\eta$ – strongly convex functions definition

convex-analysisgradient descent

I'm trying to understand better the following definition:

"A function $f:\mathbb{R}^d \rightarrow \mathbb{R}$ is $\eta$ – strongly convex with $\eta > 0$ if for all $\alpha, p \in \mathbb{R}^d$ then

$$f(p) \le f(\alpha) + \langle\nabla f(\alpha),p-\alpha\rangle + \frac{\eta}{2}\|p-\alpha\|^2."$$

"Intuitively, we have that $f$ is at least quadratic (why?), that is along any direction $u = \frac{p-\alpha}{\|p-\alpha\|}$ the function $\langle f , u \rangle$ is 1-dimensional and its second derivative is strictly positive. Which is the connection for $f$ being at least quadratic? Thank you.

EDIT:

the inequality is exactly like that. I took this result from the book "Mathematical foundations for Data Analysis" By Jeff Philips found here.

https://www.cs.utah.edu/~jeffp/M4D/M4D-v0.4.pdf

enter image description here

Best Answer

Intuitively, we can think of Lipschitz-smoothness and strong convexity as duals to one another (for many reasons, convex conjugation is one good one but I will explore a different one here). A convex differentiable function $f$ is Lipschitz-smooth with constant $L>0$ iff, for any $x,y$ it holds,

$$f(y) - f(x) - \langle\nabla f(x), y-x\rangle \leq \frac{L}{2}\|x-y\|^2.$$

Meanwhile, a convex function $f$ is strongly convex with constant $\eta>0$ iff, for any $x,y$ it holds,

$$f(y) - f(x) - \langle\nabla f(x), y-x\rangle \geq \frac{\eta}{2}\|x-y\|^2.$$

So what's really going on? Lipschitz-smoothness corresponds to a quadratic upper bound and strong convexity corresponds to a quadratic lower bound. In fact, these inequalities can both be derived from the convexity of the difference functions,

$$g_1(x) = f(x) - \frac{1}{2}\|x\|^2\quad\mbox{and}\quad g_2(x) = \frac{1}{2}\|x\|^2 - f(x)$$

When $g_1$ is convex for all $x$, the function $f$ is strongly convex. When $g_2$ is convex for all $x$, the function $f$ is Lipschitz-smooth. You can introduce constants $L$ and $\eta$ here as well.

Unrelated to your question, but if you know about Bregman divergences then this whole thing can be rewritten. Let $D_f(x,y)$ be the Bregman divergence associated to $f$. Then, we can say that $f$ is relatively smooth with respect to $F$ iff $\exists L>0$ such that, for any $x,y$,

$$D_f(x,y) \leq L D_F(x,y)$$

and similarly, $f$ is relatively strongly convex with respect to $F$ iff $\exists \eta > 0$ such that, for any $x,y$,

$$D_f(x,y) \geq \eta D_F(x,y).$$

So smoothness and strong convexity are really describing the domination of $D_f$ by another Bregman divergence $D_F$. We can recover Lipschitz-smoothness and strong convexity by taking $F(x) = \frac{1}{2}\|x\|^2$ but we are free to explore alternatives!