[Math] Proof for strongly convex function is strictly convex

convex optimizationconvex-analysis

The following is my proof for the title:

We have to show that a strongly convex function $f$ meets the following $$f(\lambda x + (1-\lambda)y)<\lambda f(x) + (1-\lambda)f(y). $$

By the definition of the strongly convex function $f$:
\begin{align*}
f(y)\geq f(x) + \langle \nabla f(x),y-x\rangle + \frac{m}{2}\|y-x\|_2^2,
\end{align*}where $m>0$. Consider both sides of the definition of strictly convex $f$:

  1. The right-hand side:
    \begin{align*}
    &\lambda f(x)+(1-\lambda)f(y) \\ &\geq \lambda f(x) + (1-\lambda) \bigg[f(x) + \langle \nabla f(x),y-x\rangle + \frac{m}{2}\|y-x\|_2^2 \bigg] \\ & = f(x) + (1-\lambda)\langle \nabla f(x),y-x\rangle + \frac{m}{2}(1-\lambda) \|y-x\|_2^2
    \end{align*}
  2. The left-hand side:
    \begin{align*}
    &f(\lambda x + (1-\lambda)y) \\ &\geq f(x) + \langle \nabla f(x),(\lambda x + (1-\lambda)y)-x\rangle + \frac{m}{2}\|(\lambda x + (1-\lambda)y)-x\|_2^2 \\ &=
    f(x) + (1-\lambda)\langle \nabla f(x),y-x\rangle + \frac{m}{2}(1-\lambda)^2\|y-x\|_2^2
    \end{align*}

Since $\lambda \in (0,1)$, $(1-\lambda)>(1-\lambda)^2$; therefore, we prove that $\lambda f(x)+(1-\lambda)f(y) > f(\lambda x + (1-\lambda)y)$


My question: There is no evidence show that this is a tight lower bound.
EX: $10>9$ and $20>1$, we cannot say that $(20-10) > (1 – 9)$. How to fix this problem?

Best Answer

To prove that a strongly convex function is convex, take the definition of strongly convex:

$f(y) \geq f(x) + \nabla f(x)^T(y-x) + \frac{m}{2}||x-y||^2$

clearly:

$f(y) > f(x) + \nabla f(x)^T(y-x)$, as $\frac{m}{2}||x-y||^2$ is positive by definition.

Now take $z = \lambda x + (1-\lambda) y$, by the strong convexity of $f$, we get:

$f(x) > f(z) + \nabla f(z)^T(x-z) = f(\lambda x + (1-\lambda) y) + \nabla f(z)^T(1-\lambda)(x-y)$

and

$f(y) > f(z) + \nabla f(z)^T(y-z) = f(\lambda x + (1-\lambda) y) + \nabla f(z)^T\lambda(y-x)$

Then add those equations with the coefficient $\lambda, (1-\lambda)$ respectively. The gradients terms will cancel and you will be left with:

$\lambda f(x) + (1-\lambda) f(y) > f(\lambda x + (1-\lambda) y)$, which is the definition of strict convexity