The derivative of the inverse square root of a Gram matrix

derivativesmatrices

Let us assume that I have a matrix $A \in \mathbb{R}^{m \times n}$ and I computed the Gram matrix $G = A^{T} A$. I would like to take the derivative of $G^{-1/2}$ with respect to $A$. I have seen some approaches utilizing eigenvalue decomposition and trying to obtain the derivative of the inverse square root from the derivative of the inverse of the Gram matrix. However, I cannot totally understand them. I would appreciate your help to solve the problem. Thanks a lot in advance.

Best Answer

$\def\v{{\rm vec}}\def\M{{\rm Mat}}\def\d{{\rm diag}}\def\D{{\rm Diag}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$Given the matrix $A$, define the symmetric matrices $$\eqalign{ G &= A^TA, \qquad F^2 = G^{-1} \quad\implies\quad F = G^{-1/2} \\ }$$ then calculate their differentials, vectorize, and solve for the desired gradient. $$\eqalign{ &dG = A^TdA + dA^TA \\ &dg = \v(dG) = \Big((I\otimes A^T) + (A^T\otimes I)K\Big)da \\ &F\,dF+dF\,F = -G^{-1}dG\,G^{-1} \\ &\big((I\otimes F)+(F\otimes I)\big)df = -\big(G^{-1}\otimes G^{-1}\big)dg \\ &\big(F\oplus F\big)df = -\big(G^{-1}\otimes G^{-1}\big)dg \\ &df = -\big(F\oplus F\big)^{-1}\big(G^{-1}\otimes G^{-1}\big)dg \\ &df = -\big(F\oplus F\big)^{-1}\big(G^{-1}\otimes G^{-1}\big) \Big((I\otimes A^T) + (A^T\otimes I)K\Big)da \\ &df = B\,da \quad\implies\quad B = \p{f}{a} \\\\ }$$


If you're content with a vectorized result then you can stop here.
If you require the full matrix-by-matrix gradient, then read on.

A pair of zero-one third-order tensors $$\eqalign{ {\vec\nu}_{\ell jk} &= \begin{cases} 1\quad{\rm if}\;\;\ell=j+km-m \\ 0\quad{\rm otherwise} \\ \end{cases} \\ {\vec\mu}_{jk\ell} &= \; {\vec\nu}_{\ell jk} \\ {\tt1}\le&j\le m,\quad {\tt1}\le k\le n \\ }$$ can be used to convert a variable between its vector $(\vec\nu)$ and matrix $(\vec\mu)$ forms $$\eqalign{ a &= \vec\nu:A \quad&\iff\quad A=\vec\mu\cdot a \\ }$$ and they allow the above result to be converted from a vector-by-vector (aka matrix) gradient into a matrix-by-matrix (aka fourth-order tensor) gradient $$\eqalign{ df &= B\cdot da \\ \vec\mu\cdot df &= \vec\mu\cdot B\cdot (\vec\nu:dA) \\ dF &= \big(\vec\mu\cdot B\cdot\vec\nu\big):dA \\ \p{F}{A} &= \vec\mu\cdot B\cdot\vec\nu \\ }$$ or in component notation $$\eqalign{ \p{F_{jk}}{A_{pq}} &= \sum_{\varepsilon=1}^{n^2}\sum_{\ell=1}^{mn} {\vec\mu}_{jk\varepsilon} B_{\varepsilon\ell} {\vec\nu}_{\ell pq} \\\\ }$$


In the preceeding, $\oplus$ denotes the Kronecker sum, $K$ denotes the Commutation Matrix associated with Kronecker products, and a colon denotes the double-dot (aka trace or Frobenius) product $$\eqalign{ A:Z &= \sum_{i=1}^m \sum_{j=1}^n A_{ij} Z_{ij} \;=\; {\rm Tr}(AZ^T) \\ A:A &= \big\|A\big\|^2_F \\ }$$