Gradient of trace norm of complex matrix

complex-analysismatricesmatrix-calculusmatrix-norms

The problem:

Let $S \in \mathbb{C}^{N\times M}$ with $N > M$ and $S^{H}S=\mathbb{I}$, let $\rho$ and $\sigma$ be hermitian matrices of trace $1$ and define the function $D: \mathbb{C}^{N\times M} \rightarrow \mathbb{R}$ as:

$$D(S) = \text{tr}(|S\rho S^{H} – \sigma|),$$

with $|A-B| = (A-B)(A-B)^{H}$ and $^H$ denoting the hermitian transpose, i.e., $D$ is the trace distance. My goal is to compute $\nabla_S D(S)$, the gradient of $D$ w.r.t $S$.

My approach:

I defined the following variables:

$$A = S\rho S – \sigma$$
$$B = A^H A.$$

$D$ then becomes:

$$D = tr(B^{1/2})$$

The goal is now to take the differential of $D$ and rearrange terms to eventually arrive at something like:

$$dD = \text{tr} (K dS),$$

with the transpose of $K$, $K^T$, being the gradient we're looking for.

My progress so far:

$$dD = d(\text{tr}(B^{1/2}) = \text{tr}(d(B^{1/2}))$$
$$dD = \frac{1}{2}\text{tr}((B^{-1/2})^T dB)$$

We have:

$$dB = (dA)^HA + A^HdA$$

And:

$$dA = dS\rho S^H + S\rho (dS)^H$$

I will now get terms with $dS$ and terms with $(dS)^H$ and I'm not sure how to manipulate them to get to an expression from which I can read out the gradient. Is this even the (or a) right approach?

Best Answer

You've done all the hard work, now you just need to do some algebra to substitute the various differentials and rearrange things into a suitable form. $$\eqalign{ dB &= dA^HA + A^HdA \cr &= (dS\,pS^H + Sp\,dS^H)^HA + A^H(dS\,pS^H + Sp\,dS^H) \cr &= (Sp^H\,dS^H + dS\,p^HS^H)A + A^H(dS\,pS^H + Sp\,dS^H) \cr &= (dS\,p^HS^HA + A^HdS\,pS^H) + (Sp^H\,dS^HA + A^HSp\,dS^H) \cr &= (dS-{\rm terms}) \quad+\quad (dS^H-{\rm terms}) \cr \cr C &= \tfrac{1}{2}\big(B^{-1/2}\big)^T \quad {\rm \big(for\,convenience\big)} \cr \cr dD &= C:dB \cr &= C:dS\,p^HS^HA + C:A^HdS\,pS^H + (dS^H-{\rm terms}) \cr &= (CA^TS^*p^* + A^*CS^*p^T):dS \quad + ({\rm terms}):dS^H \cr \frac{\partial D}{\partial S} &= CA^TS^*p^* + A^*CS^*p^T \cr }$$ The gradient wrt the conjugate variable is simply the conjugate of the gradient. $$\eqalign{ \frac{\partial D}{\partial S^H} &= (CA^TS^*p^* + A^*CS^*p^T)^H \cr }$$ NB:   Colons denote trace/Frobenius products, i.e. $\,A:B={\rm Tr}(A^TB)$