Derivative of Neural Network – Cost Function of Matrices

I have a generative neural network model for the numerical simulation of a density matrix $\hat{\sigma}_\Omega$, where $\Omega$ denotes the set of parameters of the neural network.

I wish to optimize this network with respect to the Trace Distance between its output state and a chosen, fixed target state $\hat{\rho}$:
\begin{equation}
\mathcal{D}(\hat{\rho},\hat{\sigma}_\Omega) = {1\over2}\|\hat{\rho} – \hat{\sigma}_\Omega \|_1 = {1\over2}\text{Tr}\sqrt{(\hat{\rho} – \hat{\sigma}_\Omega)^\dagger(\hat{\rho} – \hat{\sigma}_\Omega)},
\end{equation}
using gradient descent. Which means finding the derivative:
\begin{equation}
\frac{\partial\mathcal{D}(\hat{\rho},\hat{\sigma}_\Omega)}{\partial \Omega_i}.
\end{equation}

How does one compute this derivative? Can you perhaps using a chain rule like expression,
\begin{equation}
\frac{\partial \mathcal{D}(\hat{\rho},\hat{\sigma}_\Omega)}{\partial \Omega_i} = \text{Tr}\Big[ \Big(\frac{\partial \mathcal{D}(\hat{\rho},\hat{\sigma}_\Omega)}{\partial \hat{\sigma}_\Omega}\Big)^\textsf{T} \cdot \frac{\partial \hat{\sigma}_\Omega}{\partial \Omega_i}\Big]
\end{equation}
but I don't think this holds up when the states are (generally) complex? Any help/direction is much appreciated.

Best Answer

For ease of typing define $$\eqalign{ &w=\Omega,\quad P=\hat\rho,\quad S=\hat\sigma_\Omega \\ &F = \tfrac{1}{2}\Big(S-P\Big)\,\Big((S-P)^T(S-P)\Big)^{-1/2} \\ &{G}=\frac{\partial S}{\partial w} \quad\implies {G}_{ijk}=\frac{\partial S_{ij}}{\partial w_{k}} }$$ The differential of the nuclear norm of a real matrix $X$ can be written as $$d\|X\|_1 = X(X^TX)^{-1/2}:dX$$ where the colon is a convenient product notation for the trace, i.e. $\;A:B=\operatorname{Tr}(A^TB)$

Set $X=(S-P)$ and calculate the gradient of the distance function. $$\eqalign{ {\cal D} &= \tfrac{1}{2}\|X\|_1 \\ d{\cal D} &= \tfrac{1}{2}X(X^TX)^{-1/2}:dX \\ &= F:dS \\ &= F:G\,dw \\ \frac{\partial{\cal D}}{\partial w} &= F:G \\ }$$ Or in component form $$\eqalign{ \frac{\partial{\cal D}}{\partial w_k} &= \sum_i\sum_j F_{ij}\,\frac{\partial S_{ij}}{\partial w_{k}} \\ }$$

Best Answer

Related Solutions

[Math] Partial derivative of matrix product in neural network

Matrix Differentiation of Generative Neural Network

Related Question