The Hessian of $x \mapsto\log \det \left( A^T A + R^T \operatorname{diag}(x)^{-1} R \right)$

hessian-matrixmatricesmatrix-calculusmultivariable-calculusscalar-fields

This is a follow-up to a previous question I asked regarding the hessian of a similar log determinant. The log determinant I am considering is given by
$$
L(\vec{x}) = \log \det \left( A^T A + R^T D_x^{-1} R \right),
$$

where $x \in \mathbb{R}^q$ with all positive entries, $D_x = \operatorname{diag}(x)$ is a diagonal matrix, and $A \in \mathbb{R}^{p \times n}$, $B \in \mathbb{R}^{q \times n}$.

What is the Hessian of $L(x)$ with respect to $x = (x_1, \ldots, x_q)^T$?

I have determined that the gradient can be written as
$$
\nabla_x L(x) = – \operatorname{diag}\left( D_x^{-1} R \left( A^T A + R^T D_x^{-1} R \right)^{-1} R^T D_x^{-1} \right).
$$

Using an identity for the $\operatorname{diag}$ operator, this can also be written as
$$
\nabla_x L(x) = – \left( \left( D_x^{-1} R \right) \odot \left( D_x^{-1} R \right) \right) \operatorname{diag}\left( \left( A^T A + R^T D_x^{-1} R \right)^{-1} \right).
$$

Any advice on how to proceed from here? I am thinking the next step to find the gradient is to apply a product rule to this expression.

Best Answer

$ \def\L{{\cal L}} \def\l{\lambda} \def\n{\nabla} \def\o{{\tt1}} \def\BR#1{\Big(#1\Big)} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $For typing convenience, define the following symmetric matrices $$\eqalign{ X &= \Diag{x} = D_x \\ Y &= X^{-1} &\qiq dY = -Y\:dX\:Y \\ B &= \LR{A^TA+R^TYR} &\qiq dB = \;\:\, R^TdYR \\ C &= B^{-1} &\qiq dC = -C\;dB\:C \\ M &= RCR^T &\qiq dM = \; R\;dC\:R^T \\ }$$ and the Frobenius product $(:)$ with these properties $$\eqalign{ P:Q &= \sum_{i=1}^m\sum_{j=1}^n P_{ij}Q_{ij} \;=\; \trace{P^TQ} \\ Q:Q &= \frob{Q}^2 \qquad \{ {\rm Frobenius\;norm} \} \\ P:Q &= Q:P \;=\; Q^T\!:P^T \\ R:\LR{PQ} &= \LR{RQ^T}:P \;=\; \LR{P^TR}:Q \\ }$$ Use the above notation to write the objective function. With the aid of Jacobi's formula, it's straightforward to calculate its differential and gradient $$\eqalign{ \L &= \log(\det(B)) \\ d\L &= B^{-T} : dB \\ &= B^{-1} : \LR{-R^TY\:dX\:YR} \\ &= -\LR{YMY}:dX \\ &= -\diag{YMY}:dx \\ \grad{\L}{x} &= -\diag{YMY} \;\doteq\; g \\ \grad{\L}{X} &=\; \Diag{g} \;\doteq\; G \\ }$$ This result validates your own gradient calculation.

At this point, recall that $$ \diag{ABC} = \diag{C^TB^TA^T} $$ Then to calculate the Hessian, start with the differential of $g$ $$\eqalign{ dg &= -\diag{\,YM\:\c{dY} + Y\:\c{dM}\:Y + \c{dY}\,MY} \\ &= -\diag{2YM\;\c{dY} + Y\:\c{dM}\:Y} \\ &= \diag{2YMY\:\c{dX}\:Y - YMY\:\c{dX}\:YMY} \\ &= 2YG\,\c{dx} - \LR{YMY\odot YMY}\c{dx} \\ \grad{g}{x} &= 2YG - \LR{YMY\odot YMY} \;\doteq\; H \\ }$$ Since the $H$ matrix is constructed from symmetric and/or diagonal matrices, it is obviously a symmetric matrix (which is required of a Hessian).