Consider a constrained optimization problem
$$
\underset{\beta\in\mathbb R^p}{\text{minimize}}f(\beta)\quad\text{such that}\ \beta\in\mathcal C,
$$
where $f:\mathbb R^p\to\mathbb R$ is a convex objective function to be minimized, and $\mathcal C\subset\mathbb R^p$ is a convex constraint set.
When the cost function is differentiable, a necessary and sufficient condition for a vector $\beta^*\in\mathcal C$ to be a global optimum is that
$$
\langle\nabla f(\beta^*),\beta-\beta^*\rangle\ge0
$$
for all $\beta\in\mathcal C$.
I am looking for an intuitive explanation of this condition.
Using the first-order condition for convexity, we see that this condition is indeed sufficient. The condition also implies that the angle between $\nabla f(\beta^*)$ and $\beta-\beta^*$ is acute for all $\beta\in\mathcal C$. But why does this mean that $\beta^*$ is a minimzer?
Any help is much appreciated!
Best Answer
At a minimizer, the directional derivative in any feasible direction must be nonnegative. Otherwise, we would not be at a minimizer. (Because then we could decrease the objective function value by moving a bit in a feasible direction for which the directional derivative is negative.)