Gradient always points away from the minima of convex functions

convex optimizationconvex-analysis

I'm reading a paper by Léon Bottou, "Online Learning and Stochastic Approximations". He studies online learning with cost functions $C(w)$ satisfying two conditions:

  1. $C(w)$ has a single minimum $w^*$.
  2. $C(w)$ satisfies $\inf_{||w-w^*||>\epsilon} (w-w^*)\cdot\nabla_w C(w) > 0$ for all $\epsilon>0$ and all $w$ in its domain. In words, the gradient always points away from the minimum.

Here $w$ is a vector in $\mathbb R^n$.

The author states that "this condition is weaker than the usual notion of convexity." But it's not completely clear. My question is this:

Do all strictly convex functions $C(w)$ satisfy these conditions?

The first condition is trivial since all strictly convex functions have a single minimum (excluding cases where the extrema are at the boundary of the domain). So my question is really about the second condition.

Best Answer

Not all strictly convex functions have minimum, but assume it does. Then it is unique, true. Furthermore, the expression $(w^*-w)\cdot\nabla_w C(w)$ is a scalar product between the vector from the point $w$ to the minimum and the gradient at the point. The gradient is orthogonal to the level set $\{x\colon f(x)=f(w)\}$ and points outwards, the sublevel set $\{x\colon f(x)\le f(w)\}$ is convex, and the minimum $w^*$ is in the interior of the sublevel set as $f(w^*)<f(w)$. It means that the gradient and the vector $w^*-w$ point strictly at the opposite half-spaces w.r.t. the tangent hyperplane. Hence, yes, if a strictly convex function is differentiable then it does satisfy $-(w^*-w)\cdot\nabla_w C(w)>0$.

Related Question