Derivative of Hermitian sesquilinear form with respect to its own matrix

Let $H$ be an $n \times n$ Hermitian matrix (in my work, it's also positive semidefinite, if that makes a difference) and $a,b \in \mathbb{C}^n$, with $\lambda(H) = \langle a \vert H \vert b \rangle$. I'm using physicists' notation:
$\langle a \vert H \vert b \rangle = \sum_{jk} \overline{a_j} H_{jk} b_k$, where $\overline{\cdot}$ is the complex conjugate. I also denote $\langle a \rvert = a^\dagger$, the conjugate transpose of $a = \lvert a \rangle$.

Since $H$ is Hermitian, we have $\overline{\lambda}(H) = \langle b \vert H \vert a \rangle$. I need to compute the Wirtinger derivatives
$$ \frac{\partial \lambda(H)}{\partial H}, \qquad \frac{\partial \overline{\lambda}(H)}{\partial H}, $$

where

$$
\frac{\partial f(z)}{\partial z} = \frac12 \left( \frac{\partial f(z)}{\partial \text{Re}\ z} – i \frac{\partial f(z)}{\partial \text{Im}\ z} \right).
$$

If $H$ were real (and unstructured), we would have by Eq. (70) of the Matrix Cookbook that

$$
\frac{\partial a^T H b}{\partial H} = a b^T.
$$

So, naïvely, I would expect that I can just generalize this directly:

$$
\frac{\partial a^\dagger X b}{\partial X} = a b^\dagger = \lvert a \rangle\langle b \rvert.
$$

However, questions like this one, particularly the answer of Leandro Caniglia, suggest that the correct answer for a real symmetric matrix is

$$
\frac{\partial a^T H b}{\partial H} = \frac12 (ab^T + ba^T),
$$

which I suppose might generalize to my case as

$$
\frac{\partial a^\dagger H b}{\partial H} = \frac12 \left( ab^\dagger + b a^\dagger \right) = \frac12 \left( \lvert a \rangle\langle b \rvert + \lvert b \rangle\langle a \rvert \right).
$$

However, I can also notice that
$$ \lambda(H) = \langle a | H | b \rangle = \text{Tr} H \lvert b \rangle\langle a \rvert = \text{Tr} \lvert b \rangle\langle a \rvert H = \text{Tr} \lvert b \rangle\langle a \rvert H^\dagger; $$
this allows me to apply eqs. (240) and (241) of the Cookbook,

$$
\frac{\partial \text{Tr} (A X^\dagger)}{\partial \text{Re} X} = i \frac{\partial \text{Tr} (A X^\dagger)}{\partial \text{Im} X} = A,
$$

to obtain

$$ \frac{\partial \lambda(H)}{\partial H} = \frac{\partial \overline{\lambda}(H)}{\partial H} = 0. $$

This does not agree with my intuition. Where am I going wrong?

P.S. As an aside, comments in the question I linked suggest that the naïve symmetrization operator is the right one for the gradient, instead of the symmetrization listed in the Matrix Cookbook (e.g., eqs. 139 and 142); the preprint there mentioned was published this year.

Best Answer

$ \def\qif{\quad\iff\quad}\def\l{\lambda} \def\a{{\overline a}} \def\H{{\overline H}} \def\L{{\overline\l}} \def\p{\partial}\def\g#1#2{\frac{\p #1}{\p #2}} $Ditching the physicists' bra-ket notation for the engineers' double-dot product yields $$\eqalign{ \l &= \a b^T:H &\qif \L = ab^\dagger:H^* \\ \g{\l}{H} &= \a b^T &\qif \g{\L}{\H} = ab^\dagger \\ \g{\l}{\H} &= 0 &\qif \g{\L}{H} = 0 \\ }$$ where $\;A:B \equiv \sum_i\sum_j A_{ij}B_{ij}\quad$ (with no conjugation of either term!)

All of the confusion surrounding symmetric gradients does not carry over to the Hermitian case for the simple reason that (in the Wirtinger sense) a Hermitian matrix $H$ is independent of its hermitian and complex conjugates, i.e. $$\eqalign{ \g{H_{ij}^\dagger}{H_{pq}} = \g{\H_{ji}}{H_{pq}} = 0 \qquad\qquad }$$ However, a real matrix $R$ can never be independent of its transpose $$\eqalign{ \g{R_{ij}^T}{R_{pq}} = \g{R_{ji}}{R_{pq}} = \delta_{jp}\delta_{iq} \:\ne\: 0 }$$

Best Answer

Related Solutions

[Math] Derivative of transpose of inverse of matrix with respect to matrix

Rigorously show that the maximum variance of an hermitian matrix is $\left( \frac{h_{max}-h_{min}}{2} \right) ^2$

Related Question