Inequality produced by $L$-Lipschitz continuous gradient

lipschitz-functionsoptimization

Here they say:

If the first derivative of $f$ is $L$-Lipschitz continuous, then
$$f(y)\leq f(x)+\langle \nabla f(x),y-x\rangle+\dfrac{L}{2}\|y-x\|^2.$$

I don't understand how that's true from the fact that $\nabla f$ being $L$-Lipschitz continuous. I know that $\nabla f$ being $L$-Lipschitz continuous implies that $\|\nabla f(y)-\nabla f(x)\|\leq L\|f(x)-f(y)\|$, but I don't know how to derive the other inequality.

Certainly $\|y-x\|^2\geq 0$, but what do we know about the other terms? I'd appreciate if someone can give me a hint how to show that or forward some resource.

Best Answer

Consider $$g(t)= f(x)-f( x+t(y-x))+\langle \nabla f(x),y-x\rangle t+\dfrac{L}{2}\|y-x\|^2t^2$$ on $[0,1]$
You see that $$ \begin{align}g'(t)&= \langle \nabla f(x)-\nabla f(x+t(y-x)),y-x\rangle+L\| y-x\|^2t \\ &\ge - \|\nabla f(x)-\nabla f(x+t(y-x))\| \|x-y\|+L\|y-x\|^2t \\ &\stackrel{\text{Lipschitz}}{\ge} -L \| t(y-x)\| \|y-x\|+L\|y-x\|^2t=0 \end{align}$$ Hence $g$ is increasing on $[0,1]$, hence $g(1) \ge g(0)=0$.
$\square$

Related Question