[Math] Convex functions lack saddle points

convex optimizationhessian-matrixmultivariable-calculusoptimization

I am reading "Deep Learning" by Ian Goodfellow. At page 86, the author explains how to use the Hessian to evaluate whether a point of a multivariate function is a maximum or a minimum

At a critical point, where $ \nabla_x f(x)=0 $, we can examine the
eigenvalues of the Hessian to determine whether the critical point is
a local maximum, local minimum or saddle point. When the Hessian is
positive definite (all its eigenvalues are positive), the point is a
local minimum. […] Likewise when the Hessian is negative (all its
eigenvalues are negative), the point is a local maximum. In multiple
dimensions, it is actually possible to find positive evidence of
saddle points in some cases. When at least one eigenvalue is positive
and at least one eigenvalue is negative, we know that $x$ is a local
maximum on one cross section of $f$ but a local minimum on another
cross-section. […] The test is inconclusive whenever all the nonzero
eigenvalues have the same sign but at least one eigenvalue is zero.
This is because the univariate second derivative test is inconclusive
in the cross section corresponding to the zero eigenvalue

So far so good. At page 89 it talks about convex optimization, and says that:

Convex functions – functions for which the Hessian is positive
semi-definite everywhere [..] are well-behaved because they lack
saddle points

But if the Hessian is positive-semidefinite, it means that some eigenvalues may be zero, while the others are positive. I thought that "whenever all the nonzero
eigenvalues have the same sign but at least one eigenvalue is zero" the test was inconclusive. So why does it says that they surely lack saddle points?

Best Answer

One property of a differentiable convex function $f:\mathbb R^n \to \mathbb R$ is that if $a \in \mathbb R^n$ then $$ f(x) \geq f(a) + \langle \nabla f(a), x-a\rangle $$ for all $x \in \mathbb R^n$. It follows that if $\nabla f(a) = 0$ then $a$ is a global minimizer of $f$.