Hessian of Covariance Matrices – Linear Algebra and Convex Optimization

ca.classical-analysis-and-odesconvex optimizationconvex-analysislinear algebramatrices

Suppose we have a typical logdet function $\mathcal{L}$ with respect to a covariance matrix $\mathbf{A}$,
$$
\mathcal{L}(\mathbf{A}) = \log\vert \mathbf{I} + \mathbf{A}\mathbf{S} \vert – \mathbf{q}^T(\mathbf{A}^{-1} + \mathbf{S})^{-1} \mathbf{q},
$$
where $\mathbf{S}$ is a Symmetric Positive Semi-Definite Matrix, $\mathbf{q}$ is a column vector, $\mathbf{I}$ is an identity matrix.

The questions are

  1. How could I calculate the gradient of $\mathcal{L}(\mathbf{A})$, Given that $\mathbf{A}$ is a Symmetric Positive Definite Matrix ? I found in literature that if $\mathbf{A}$ is a Symmetric matrix, $\frac{\partial \log\vert \mathbf{A}\vert}{\partial \mathbf{A}}= \mathbf{A}^{-1} + \mathbf{A}^{-T} – \mathbf{I}\odot\mathbf{A}^{-1}$, where $\odot$ is Hadamard Product. However, I found that most of current literatures simply assumes that $\frac{\partial \log\vert \mathbf{A}\vert}{\partial \mathbf{A}} = \mathbf{A}^{-T}$. Will these difference affects the gradient result?

  2. How could I calculate the Hessian of $\mathcal{L}(\mathbf{A})$? i.e., $H = \frac{\partial \mathcal{L}(\mathbf{A})}{\partial^2\mathbf{A}}$, where I need to calculate the derivatives of a Matrix to a Matrix. If I want to find the minima, maxima, saddle points of $\mathbf{H}$, should the result that the Hessian matrix $\mathbf{H}$ being positive definite, negative definite, and none definite still holds ? How could I find the minima by exploiting the Hessian matrix, which is a matrix-by-matrix derivatives.

Best Answer

I am back with good news: you have nothing to do because there are (in general) no critical points! It is the case if, in particular, $S$ is invertible. Recall that $\mathcal{L}(A)=log|I+AS|-q^T(I+AS)^{-1}Aq$ and (according to my previous post) $D\mathcal{L}_A=\mathcal{L}'(A)=H\rightarrow tr((I+AS)^{-1}HS)-q^T(-(I+AS)^{-1}HS(I+AS)^{-1}A+(I+AS)^{-1}H)q$.

Thus $D\mathcal{L}_A=0$ is equivalent to: for any matrix $H$, $tr(S(I+AS)^{-1}H)=q^T(-(I+AS)^{-1}H(I-S(I+AS)^{-1}A)q$, that is in the form

(*) $tr(UH)=r^THs$ where $r,s$ are vectors and with $U=S(I+AS)^{-1}$.

Lemma: If (*) is true for any $H$, then necessarily $rank(U)\le 1$.

Proof: for any $h_{i,j}$, $\sum_{i,j}u_{j,i}h_{i,j}=\sum_{i,j}r_is_jh_{i,j}$. Thus $u_{j,i}=s_jr_i$, that is $U=sr^T$. Therefore $rank(U)\leq 1$.

Conclusion: Here $A,S\geq 0$; thus $I+AS$ is invertible and consequently, $\mathcal{L}$ admits some critical points only if $rank(S)\leq 1$, that is $S=ww^T$ where $w$ is a vector.

Related Question