Finding the Gradient Matrix for the given expression

derivativeslinear algebramatricesmatrix-calculusquantum-information

Let $\rho$ be a matrix, and let $\rho_A$ be the partial trace of the matrix $\rho$. For simplicity, let us assume $\rho$ is a $4 \times 4$ matrix. The partial trace is defined as follows:

If
$$\rho = \begin{bmatrix}a & b & c & d \\
e & f & g & h \\
i & j & k & l \\
m & n & o & p \end{bmatrix}$$

Then
$$\rho_A = \begin{bmatrix} a+f & c+h \\i+n & k + p \end{bmatrix}$$

I would like to calculate the following:
$$\nabla_\rho (Tr(\rho log \rho) – Tr(\rho_A log \rho_A))$$

which is the derivative of a continuous scalar quantity with respect to a matrix, so I should be able to calculate it right?

Note that each of the entries are complex.

I tried a few things, but I'm not sure whether chain rule works, because you end up getting gradients of matrices with respect to matrices, which is not defined?

Any help on how to go about such a thing would be very appreciated.

Best Answer

Consider a scalar function, its derivative, and its differential. $$\eqalign{ \phi &= {\rm Tr}\big(\alpha \log(\alpha)\big) &\implies \frac{d\phi}{d\alpha} = \log(\alpha) + 1 \cr d\phi &= \big(\log(\alpha) + 1\big)\,d\alpha \cr }$$ When the argument is a square matrix $(A)$, this becomes $$\eqalign{ \phi &= {\rm Tr}\big(A\log(A)\big) \cr d\phi &= \big(\log(A)+I\,\big)^T:dA \cr }$$ where a colon denotes the trace product, i.e. $\,A:B = {\rm Tr}(A^TB)$

Now define the cartesian basis vectors and their matrix analogs. $$\eqalign{ &e_1 &= \pmatrix{1\\0},\quad &e_2 &= \pmatrix{0\\1} \cr &E_1 &= e_1\otimes I,\quad &E_2 &= e_2\otimes I \cr }$$ where $I\in{\mathbb R}^{2\times 2}$ is the identity matrix.

The basis matrices can be used to extract $2\times 2$ blocks from the $\rho$ matrix while the basis vectors can be used to construct the components of $\rho_A$ $$\eqalign{ \rho_A &= e_1e_1^T\;{\rm Tr}\big(E_1^T\rho E_1\big) + e_1e_2^T\;{\rm Tr}\big(E_1^T\rho E_2\big) + e_2e_1^T\;{\rm Tr}\big(E_2^T\rho E_1\big) + e_2e_2^T\;{\rm Tr}\big(E_2^T\rho E_2\big) \cr &= e_ie_k^T\big(E_iE_k^T:\rho\big) }$$ The final expression employs the summation convention over $(i,k)$.

Combining the above results answers the current question. $$\eqalign{ \def\red#1{\color{red}{#1}} \psi &= {\rm Tr}\big(\rho\log(\rho)\big) - {\rm Tr}\big(\rho_A\log(\rho_A)\big) \cr d\psi &= \big(\log(\rho)^T+I\otimes I\big):d\rho - \big(\log(\rho_A)^T+I\big):d\rho_A \cr &= \big(\log(\rho)^T+I\otimes I\big):d\rho - \big(\log(\rho_A)^T+I\big):e_ie_k^T\big(E_iE_k^T:d\rho\big) \cr &= \big(\log(\rho)^T+I\otimes I\big):d\rho - \big(e_i^T\log(\rho_A)^Te_k+\delta_{ij}\big)\big(E_iE_k^T:d\rho\big) \cr \frac{\partial\psi}{\partial \rho} &= \log(\rho)^T \red{+I\otimes I} - e_i^T\log(\rho_A)^Te_k\,E_iE_k^T \red{-E_kE_k^T} \\ &= \log(\rho)^T - e_i^T\log(\rho_A)^Te_k\,E_iE_k^T \\ &= \log(\rho)^T - e_i^T\log(\rho_A)^Te_k\,\big(e_ie_k^T\otimes I\big) \\ &= \Big(\log(\rho) - \log(\rho_A)\otimes I\Big)^T \\ }$$