I am studying Boyd & Vandenberghe's Convex Optimization and encountered a problem on page 642. According to the definition, the derivative $Df(x)$ has the form:
$$f(x)+Df(x)(z-x)$$
and when $f$ is real-valued (i.e., $f : \Bbb R^n \to \Bbb R$), the gradient is
$$\nabla{f(x)}=Df(x)^{T}$$
See the original text below:
But when discussing the gradient of function $f(X)=\log{\det{X}}$, author said "we can identify $X^{-1}$ as the gradient of $f$ at $X$", please see below:
Where did trace $\mbox{tr}(\cdot)$ go?
Best Answer
First of all, if you write (for a general function $f: U \to \mathbb R$, where $U \subset \mathbb R^K$)
$$f(y) \approx f(x) + Df(x) (y-x),$$
then term $Df(x) (y-x)$ is really
$$\sum_{i=1}^K D_i f \ (y_i - x_i).$$
Now the function $Z\mapsto \log\det (Z)$ are defined on an open set $S^n_{++}$ in $\mathbb R^{n^2}$, so it has $n^2$ coordinate given by $Z_{ij}$, where $i, j = 1, \cdots, n$.
Now take a look at
$$\begin{split} \text{tr} \left( X^{-1} (Z-X)\right) &= \sum_{i=1}^n \left(X^{-1} (Z-X) \right)_{ii}\\ &= \sum_{i=1}^n \sum_{j=1}^n X^{-1}_{ij} (Z_{ji}-X_{ji}) \\ \end{split}$$
Thus we should have identified $(X^{-1})^T$ as the gradient of $\log \det$.