Derivatives of log determinants of matrix products with respect to diagonal matrices

hessian-matrixmatricesmatrix-calculusnumerical linear algebra

I am looking for some advice on calculating gradients (and possibly the Hessian) of a log determinant that I have, from someone that is better-versed in matrix calculus than myself. I have broken my question into three parts. The log determinant I am considering is given by
$$
L(\vec{d}_1, \vec{d}_2) = \log \det \left( A^T D_1 A + B^T D_2 B \right),
$$

with $D_1 = \operatorname{diag}(\vec{d}_1)$ and $D_2 = \operatorname{diag}(\vec{d}_2)$ both diagonal matrices, and $A \in \mathbb{R}^{p \times n}$, $B \in \mathbb{R}^{q \times n}$. Here the matrices $A$ and $B$ are very large (I only have access to matrix-vector products by these matrices, and cannot factorize them). The gradients I would like to calculate are
$$
\nabla_{\vec{d}_1} L \quad \text{ and } \quad \nabla_{\vec{d}_2} L.
$$

Question 1: Is there a simple analytical form for these gradients? Using a matrix calculus calculator, it seems the simplest expressions I am able to get are
$$
\begin{align}
\nabla_{\vec{d}_1} L &= \operatorname{diag}\left(A \left( A^T D_1 A + B^T D_2 B \right)^{-1} A^T \right), \\
\nabla_{\vec{d}_2} L &= \operatorname{diag}\left(B \left( A^T D_1 A + B^T D_2 B \right)^{-1} B^T \right).
\end{align}
$$

Assuming that this is the simplest form possible, here is my next question.

Question 2: What is the best way to compute repeated gradient-vector products with these gradients, for changing $\vec{d}_1$ and $\vec{d}_2$? i.e., what is the best way to compute $(\nabla_{\vec{d}_1} L)^T v$ for arbitrary vectors $v$? From just looking at the expressions I gave above, I am thinking that the best I can do is to use a stochastic estimator for the diagonal of the inner expressions, where I estimate
$$
\operatorname{diag}\left(A \left( A^T D_1 A + B^T D_2 B \right)^{-1} A^T \right) \approx \left[ \sum_{j=1}^s v_j \, \odot \, A \left( A^T D_1 A + B^T D_2 B \right)^{-1} A^T v_j \right] \oslash \left[ \sum_{j=1}^s v_j \odot v_j \right]
$$

for a set of $s$ random vectors $\{v_j\}_{j=1}^s$. But is this the best way? This can be expensive to do, since each matrix-vector product involves solving a linear system with the conjugate gradient method or similar.

Question 3: Is there a simple analytical form for the Hessian of $L$ w.r.t. $[\vec{d}_1, \vec{d}_2]$ (all of the parameters)? I have not made any progress on this myself. A similar question has been asked here, but note that in this question they were asking about the Hessian of the log determinant of $X$ w.r.t. to a dense matrix $X$, whereas I am asking about the Hessian of a log determinant but w.r.t. just the diagonal entries of a diagonal matrix (my Hessian should be a $(p+q)\times(p+q)$ matrix). So I am hoping that some simplification can be made in my case.

Best Answer

$ \def\l{\lambda}\def\o{{\tt1}}\def\p{\partial} \def\n{\nabla}\def\g{\large{g}} \def\LR#1{\left(#1\right)} \def\diag#1{\operatorname{diag}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\M{M^{-1}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $The answer to your first question is that the $\tt{diag}$ expressions are probably the best available.

For the second question, I don't know of a better approximation than that of the linked paper.

Your third question can be approached using the following $\tt{diag}-$identity $$\eqalign{ \diag{AXB} &= \LR{B^T\odot A}\diag{X} \\ }$$ For typing convenience, define the matrix variables $$\eqalign{ M &= M^T \,=\; \LR{A^TD_1A + B^TD_2B} \\ Y &= Y^T \;=\; \LR{A\M A^T} \\ }$$ Consider the first gradient $(\g=\n_{d_1}L)$ with respect to the first vector $(d_1)$ $$\eqalign{ \g &= \diag{A\M A^T} \;=\; \diag Y \\ d\g &= -\diag{A\M\,\c{dM}\,\M A^T} \\ &= -\diag{A\M\,\CLR{A^TdD_1A}\,\M A^T} \\ &= -\diag{Y\,dD_1\,Y} \\ &= -\LR{Y\odot Y}\diag{dD_1} \\ \n_{d_1}\g &= -\LR{Y\odot Y} \:\:=\; \n_{d_1}\n_{d_1}L \\ }$$ The remaining Hessians $\big\{\n_{d_1}\n_{d_2}L,\;\n_{d_2}\n_{d_1}L,\;\n_{d_2}\n_{d_2}L\big\}$ can be calculated similarly.