Prove function $f(X) := \text{tr} \left( X^{-1} A \right)$ is convex

convex-analysismatrix-calculusmultivariable-calculus

Let $S^{n}_{+}$ and $S^{n}_{++}$ denote the set of positive semidefinite and positive definite (symmetric) $n \times n$ matrices, respectively. Let function $f : S^{n}_{++} \to \mathbb R$ be defined by

$$f(X) := \text{tr} \left( X^{-1} A \right)$$

where $A \in S^{n}_{+}$. When $f$ is differentiable, given that

$$\nabla_X f = – X^{-1}AX^{-1}$$

can we show that $f$ is convex via the use of the equivalent proposition of convexity

$$\langle\nabla f(X)-\nabla f(Y),\ X-Y\rangle \ge 0$$

where $\langle \cdot, \cdot\rangle$ denotes the inner product of $S^{n}_{++}$? Or can it be proven that the function is convex in a simpler way?

Best Answer

This is a formal proof, i am trying to give a proof following the lines suggested in the OP.

We may assume that $A$ is moreover positive definite, since any positive semidefinite $A$ can be approximated by a sequence of positive definite matrices. Now we can precompose with the linear, invertible map $g:Y\to A^{1/2}YA^{1/2}$ and show equivalently that the composition $$ \begin{aligned} Y\to g(Y) &\to f(g(Y)) \\ &= \operatorname{Trace}\Big(\ g(Y)^{-1}A\ \Big) \\ &= \operatorname{Trace}\Big(\ (A^{1/2}YA^{1/2})^{-1}A\ \Big) \\ &= \operatorname{Trace}\Big(\ A^{-1/2}Y^{-1}A^{-1/2}\; A\ \Big) \\ &= \operatorname{Trace}\Big(\ Y^{-1}\;A^{-1/2}AA^{-1/2}\ \Big) \\ &= \operatorname{Trace}\Big(\ Y^{-1}\ \Big) \end{aligned} $$ is a convex map. This is rather for making typing easy in the following. So we assume without loss of generality $A=I=1$, the unit matrix. Now we fix a point $X$ in the space of $n\times n$ positive definite matrices, and try to exhibit formally a Taylor expansion for $f:X\to \operatorname{Trace}\Big(\ X^{-1}\ \Big)$. For this, we consider a small deformation using a (symmetric) $H$-difference for / from $X$ and compute the expansion for $f(X+H)-f(X)$: $$ \begin{aligned} f(X+H)-f(X) &= \operatorname{Trace}\Big(\ (X+H)^{-1}\ \Big) - \operatorname{Trace}\Big(\ X^{-1}\ \Big) \\ &= \operatorname{Trace}\Big(\ (X+H)^{-1} - X^{-1}\ \Big) \\ &= \operatorname{Trace}\Big(\ X^{-1/2}(I+X^{-1/2}HX^{-1/2})^{-1}X^{-1/2} - X^{-1}\ \Big) \\ &= \operatorname{Trace}\Big(\ X^{-1/2} \Big[\ (I+\underbrace{X^{-1/2}HX^{-1/2}}_{=:U\text{ notation}})^{-1} -I\ \Big] X^{-1/2} \ \Big) \\ &= \operatorname{Trace}\Big(\ X^{-1/2} \Big[\ (I-U+U^2-U^3+U^4-U^5+\dots) -I\ \Big] X^{-1/2} \ \Big) \\ &= \operatorname{Trace}\Big(\ X^{-1/2} \Big[\ -U+U^2+O(\|H\|^3)\ \Big] X^{-1/2} \ \Big) \\ &= \operatorname{Trace}\Big(\ X^{-1/2} \Big[\ -X^{-1/2}HX^{-1/2} +X^{-1/2}HX^{-1}HX^{-1/2} +O(\|H\|^3)\ \Big] X^{-1/2} \ \Big) \\ &= \operatorname{Trace}\Big(\ -X^{-1}HX^{-1} +X^{-1}HX^{-1}HX^{-1} +O(\|H\|^3) \ \Big) \\ &= \operatorname{Trace}\Big(\ -X^{-1}HX^{-1} \ \Big) + \operatorname{Trace}\Big(\ X^{-1}HX^{-1}HX^{-1} \ \Big) +O(\|H\|^3) \ . \end{aligned} $$ So the first derivative of $f$, computed in the point $X$, is the linear map $H\to f'(X)(H):=\operatorname{Trace}\Big(\ -X^{-1}HX^{-1} \ \Big)$. (Seen as a map from the tangent space in $X$ of the space $S_{++}^n$ with values in $\Bbb R$.)

The convexity addresses the next term in the expansion, necessarily $$ \operatorname{Trace}\Big(\ X^{-1}HX^{-1}HX^{-1} \ \Big) = \operatorname{Trace}\Big(\ X^{-1/2}\; \underbrace{(X^{-1/2}HX^{-1/2})^2}_{=U^2\in S_+^n}\; X^{-1/2} \ \Big) \ge 0 \ . $$ Strictly speaking we have not finished the proof, because "in some directions" $H$ may vanish and maybe the other terms in the Taylor expansion gain the upper hand. But the argument can be accurately hot ironed maybe simplest by considering the full expression $U^2-U^3+U^4-\dots=U(I+U)^{-1}U$ instead.

Related Question