The derivative of $\log \det X$ when $X$ is symmetric

derivativesdeterminantmatricesmatrix-calculusscalar-fields

According to Appendix A.4.1 of Boyd & Vandenberghe's Convex Optimization, the gradient of $f(X):=\log \det X$ is

$$\nabla f(X) = X^{-1}$$

The domain of the $f$ here is the set of symmetric matrices $\mathbf S^n$. However, according to the book "Matrix Algebra from a Statistician's Perspective" by D. Harville, $\log \det X$ for a symmetric $X$ must be (see eq. 8.12 of book)

$$\log \det X = 2 X^{-1} – \text{diag} (y_{11}, y_{22}, \dots, y_{nn})$$

where $y_{ii}$ represents the $i$th element on the diagonal of $X^{-1}$. Now I'm not a mathematician but to me the formula of Harville seems correct, because he makes use of the fact that the entries of $X$ are not "independent". Indeed, in the case where the entries are ''independent'', Harville provides another formula (eq. 8.8 of his book), which matches that of Boyd & Vandenberghe.

Is this an error on the book of Boyd & Vandenberghe, or am I missing something here? To me it does seem like an error, but at the same time I find this extremely unlikely as the book is very popular and if it were an error it would already be on Errata; it's much more likely that I'm misunderstanding something. This formula has already been mentioned in many questions in this website, but no question or answer that I saw mentions (the possibility of) $\log \det X$ in Boyd & Vandenberghe being wrong.


Edit based on response of Profs. Boyd & Vandenberghe

Prof. Boyd kindly responded to my email about this issue, provided an explanation that he and Lieven Vandenberghe think can can explain the discrepancy between the two formula. In essence, their reply suggests that the discrepancy can be due to the inner product choice. To better explain why, I need to summarize their proof in Appendix A.4.1 of the Convex Optimization book.

The proof is based on the idea that the derivative of a function gives the first-order approximation of the function. That is, the derivative of $f(X)$ can be obtained by finding a matrix $f(X)$ that satisfies

$$f(X+\Delta X) \approx f(X)+\langle D,\Delta X\rangle.$$

In the book Boyd&Vandenberghe use the $\text{trace}(\cdot)$ function as the inner product $\langle \cdot, \cdot \rangle$, and show that

$$f(X+\Delta X) \approx f(X)+\text{trace}(X^{-1}\Delta X).$$

The book is publicly available; how they arrived at this expression can be seen in the Appendix A.4.1. In their reply, Prof. Boyd suggests that they suspect the discrepancy to stem from the inner product use. While they used $\text{trace}(\cdot)$, he suggests that some other people may use $\langle A,B\rangle = \sum_{i<=j} A_{ij}B_{ij}$ . Authors claim that this can explain the discrepancy (although I'm not sure if they looked at the proof of Harville or others about the implicit or non-implicit usage of this inner product), because the trace function puts twice as much weight on the off-diagonal entries.


Some questions where Boyd & Vanderberghe's formula is mentioned:

Best Answer

This is a really well done paper that describes what is going on:

Shriram Srinivasan, Nishant Panda. (2020) "What is the gradient of a scalar function of a symmetric matrix?" https://arxiv.org/pdf/1911.06491.pdf

Their conclusion is that Boyd's formula is the correct one, which comes by restricting the Frechet derivative (defined in $\mathbb{R}^{n \times n}$) to the subspace of symmetric n x n matrices, denoted $\mathbb{S}^{n \times n}$. Deriving the gradient in the reduced space of $n(n+1)/2$ dimensions and then mapping back to $\mathbb{S}^{n \times n}$ is subtle and can't be done so simply, leading to the inconsistent result by Harville.