[Math] Why does positive (semi-)definiteness imply convexity

convex optimizationconvex-analysishessian-matrixpositive-semidefinite

According to https://en.wikipedia.org/wiki/Positive-definite_matrix: "It turns out that the (hessian) matrix M (of a multi-dimensional function) is positive definite if and only if it is symmetric and its quadratic form is a strictly convex function." i.e. Convexity seems to imply positive (semi-)definiteness.

Is there an intuitive (possibly geometric) explanation for why this is the case?

I know that the diagonals of the hessian matrix of a function give the curvature of that function along the respective dimensions. For convexity to hold, the multidimensional function must have a positive curvature in every dimension (all diagonals >= 0) and in every possible combination of those dimensions.
How does positive (semi-)definiteness ensure this?

Best Answer

Suppose function $f : \mathbb{R}^d\rightarrow \mathbb{R}$ is twice differentiable over its domain. We want to prove $\forall x: \nabla^2 f(x)\succeq 0$ if and only if $f(\cdot)$ is convex.

Convexity $\Rightarrow$ Positive semi-definite Hessian \ The first order characterisation of convexity is: $$f(y)\ge f(x) + \nabla f(x)^\top (y-x)$$ (i) one dimensional case: $d=1$

For $d=1$ we only need to prove $f''(x)\ge 0$. Pick two arbitrary points $x,y$, and wlog assume $y>x$. Using convexity we have $$f(y) \ge f(x) + f'(x)(y-x)$$ If we switch the variables $x,y$ and rewrite the equation we get $$f(y) \le f(x) + f'(y)(y-x)$$ Combining the two: $$f(x) + f'(x)(y-x) \le f(x) + f'(y)(y-x)$$ and finally by cancelling the two $f(x)$ terms and dividing by $y-x$ (assumed to be positive) we'll get: $$f'(x) \le f'(y)$$ Meaning the function $f'(x)$ must be monotonically non-decreasing. Now we can prove that $f''(x)\ge 0$, using the definition of a derivative: $$f''(x) = \lim_{h\rightarrow 0} \frac{f'(x+h)-f'(x)}{h}$$ if we have $f''(x)<0$ then there must be $h>0$ such that $f'(x+h)-f'(x)<0$ because of the convergence of the limit. However this contradicts the result we got before that derivative must be monotonically non-decreasing.

(ii) General case $d>1:$

Now going back to the general case, lets assume we have an arbitrary point $x$ and direction $v$ in $\mathbb{R}^d$. Now define $g: \mathbb{R}\rightarrow\mathbb{R}$: $$g(t) := f(x + t v) $$ It's easy to prove that $g(\cdot)$ is convex and twice differentiable. Using chain rule, we can compute the second derivative as: $$g''(t) = v^\top \nabla ^2 f(x + t v ) v$$ Using the result from $d=1$, convexity of $g$ implies $g''(t)\ge 0$ for all $t$. In particular for $t=0$: $$v^\top \nabla^2 f(x) v=g''(0) \ge 0$$ Now because $v$ was chosen arbitrarily, it means in every direction the term must be positive, which implies semipositive definiteness of $\nabla^2f(x)$.

PSD Hessian $\Rightarrow$ Convexity The proof strategy is very similar. First we prove it for $d=1$ case and then generalise it using the same trick.

Related Question