Complex Matrix Gradient of Frobenius Norm

complex numbersderivativesmatricesmatrix-calculusmatrix-norms

I want to find the complex gradient of $|| \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2}$ with respect to $\mathbf{A}$ when every matrix is complex. I know that if everything was real, I would have:

$$ \nabla_{\mathbf{A}} || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = \frac{ \partial || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2}}{ \partial \mathbf{A}} = 2 \mathbf{D}^{T} \left( \mathbf{D} \mathbf{A} – \mathbf{X} \right)$$

Question: But what about the complex case?


Attempt:

We know from the matrix cookbook, that since $|| \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2}$ is a real function of a complex matrix, then we will have:

$$ \nabla_{\mathbf{A}} || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = 2 \frac{ \partial || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2}}{ \partial \mathbf{A^{\ast}}} $$

where $\mathbf{A}^{\ast}$ is the complex conjugate of $\mathbf{A}$.

Expanding the Frobenius product, we have:

$$
|| \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = \text{Trace}\left( (\mathbf{X} – \mathbf{D} \mathbf{A})^{H}(\mathbf{X} – \mathbf{D} \mathbf{A}) \right)
$$

$$
d|| \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = \text{Trace}\left( – \mathbf{X}^{H}\mathbf{D} d\mathbf{A} – d\mathbf{A}^{H} \mathbf{D}^{H} \mathbf{X} + d \mathbf{A}^{H} \mathbf{D}^{H} \mathbf{D} \mathbf{A} + \mathbf{A}^{H} \mathbf{D}^{H} \mathbf{D} d\mathbf{A} \right)
$$

But since $|| \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2}$ is real, and when $\mathbf{X}$ is real, we have $\text{Trace}(X) = \text{Trace}(X)^{\ast} = \text{Trace}(X^{\ast})$, we can write this as:

$$
d|| \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = \text{Trace}\left( – \mathbf{X}^{T}\mathbf{D}^{\ast} d\mathbf{A}^{\ast} – d\mathbf{A}^{T} \mathbf{D}^{T} \mathbf{X}^{\ast} + d \mathbf{A}^{T} \mathbf{D}^{T} \mathbf{D}^{\ast} \mathbf{A}^{\ast} + \mathbf{A}^{T} \mathbf{D}^{T} \mathbf{D}^{\ast} d\mathbf{A}^{\ast} \right)
$$

$$
d|| \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = -\text{Trace}\left( \mathbf{X}^{T} \mathbf{D} d \mathbf{A}^{\ast} \right) – \text{Trace}\left( \mathbf{D}^{T} \mathbf{X}^{\ast} d \mathbf{A}^{T} \right) + \text{Trace}\left( \mathbf{D}^{T} \mathbf{D} \mathbf{A}^{\ast} d \mathbf{A}^{T} \right) + \text{Trace}\left( \mathbf{A}^{T} \mathbf{D}^{T} \mathbf{D}^{\ast} d A^{\ast} \right)
$$

And so, dropping the terms with $d A^{T}$, we have:

$$
\frac{ \partial || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2}}{ \partial \mathbf{A^{\ast}}} = – \mathbf{D}^{T} \mathbf{X} + \mathbf{D}^{H} \mathbf{D} \mathbf{A}
$$

And so:

$$ \nabla_{\mathbf{A}} || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = -2 \mathbf{D}^{T} \mathbf{X} + 2\mathbf{D}^{H} \mathbf{D} \mathbf{A} $$

Am I correct about this or am I missing something?


Edit: Okay I think I have this figured out. We start with:

$$
d || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = \text{Trace}\left( – \mathbf{X}^{H}\mathbf{D} d\mathbf{A} – d\mathbf{A}^{H} \mathbf{D}^{H} \mathbf{X} + d \mathbf{A}^{H} \mathbf{D}^{H} \mathbf{D} \mathbf{A} + \mathbf{A}^{H} \mathbf{D}^{H} \mathbf{D} d\mathbf{A} \right)
$$

We rewrite this as:

$$
d || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = \text{Trace}\left( – \mathbf{X}^{H}\mathbf{D} d\mathbf{A} – d\mathbf{A^{\ast}}^{T} \mathbf{D}^{H} \mathbf{X} + d \mathbf{A^{\ast}}^{T} \mathbf{D}^{H} \mathbf{D} \mathbf{A} + \mathbf{A}^{H} \mathbf{D}^{H} \mathbf{D} d\mathbf{A} \right)
$$

$$
d || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = – \text{Trace}\left( \mathbf{X}^{H}\mathbf{D} d\mathbf{A} \right) – \text{Trace} \left( \mathbf{D}^{H} \mathbf{X} d\mathbf{A^{\ast}}^{T} \right) + \text{Trace} \left( \mathbf{D}^{H} \mathbf{D} \mathbf{A} d \mathbf{A^{\ast}}^{T} \right) + \text{Trace} \left(\mathbf{A}^{H} \mathbf{D}^{H} \mathbf{D} d\mathbf{A} \right)
$$

So to to compute $\frac{\partial || \mathbf{X} – \mathbf{D} \mathbf{A} ||^{2}_{F}}{ \partial \mathbf{A}^{\ast}}$, we need to treat $\mathbf{A}$ and $\mathbf{A}^{\ast}$ as independent variables. In other words, we drop all the terms with $dA$ but not $dA^{\ast}$:

$$ \rightarrow -\text{Trace} \left( d\mathbf{A^{\ast}}^{T} \mathbf{D}^{H} \mathbf{X} \right) + \text{Trace} \left( d \mathbf{A^{\ast}}^{T} \mathbf{D}^{H} \mathbf{D} \mathbf{A} \right)$$

Thus:

$$\frac{\partial || \mathbf{X} – \mathbf{D} \mathbf{A} ||^{2}_{F}}{ \partial \mathbf{A}^{\ast}} = – \mathbf{D}^{H} \mathbf{X} + \mathbf{D}^{H} \mathbf{D} \mathbf{A} = \mathbf{D}^{H} \left( \mathbf{D} \mathbf{A} – \mathbf{X} \right)
$$

Since $d \text{Trace} \left( F(\mathbf{Y}) \right) = f(\mathbf{Y})^{T} d \mathbf{Y}$, where $f(\cdot)$ is the derivative of $\mathbf{F}(\cdot)$ with respect to $\mathbf{Y}$, we have $d \text{Trace} \left( F(\mathbf{Y}) \right) = d\text{Trace} \left( F(\mathbf{Y}) \right)^{T} = \left( f(\mathbf{Y})^{T} d \mathbf{Y} \right)^{T} = d \mathbf{Y}^{T} f(\mathbf{Y}) $, and we let $\mathbf{Y} = \mathbf{A}^{\ast}$

Similarly, to find $\frac{\partial || \mathbf{X} – \mathbf{D} \mathbf{A} ||^{2}_{F}}{ \partial \mathbf{A}}$, we need to drop all $d\mathbf{A^{\ast}}$ terms:

$$ \rightarrow – \text{Trace}\left( \mathbf{X}^{H} \mathbf{D} d \mathbf{A} \right) + \text{Trace}\left( \mathbf{A}^{H} \mathbf{D}^{H} \mathbf{D} d \mathbf{A} \right) = – \text{Trace}\left( \mathbf{X^{\ast}}^{T} \mathbf{D} d \mathbf{A} \right) + \text{Trace}\left( \mathbf{A^{\ast}}^{T} \mathbf{D^\ast}^{T} \mathbf{D} d \mathbf{A} \right)$$

So using the same rule, we get:

$$\frac{\partial || \mathbf{X} – \mathbf{D} \mathbf{A} ||^{2}_{F}}{ \partial \mathbf{A}} = -\mathbf{D} \mathbf{X}^{\ast} + \mathbf{D}^{T} \mathbf{D}^{\ast} \mathbf{A}^{\ast} = \mathbf{D}^{T}\left( \mathbf{D}^{\ast} \mathbf{A}^{\ast} – \mathbf{X}^{\ast} \right) = \mathbf{D}^{T} \left(\mathbf{D} \mathbf{A}- \mathbf{X} \right)^{\ast}
$$

So we have:

$$ \nabla_{\mathbf{A}} || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = 2 \frac{ \partial || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2}}{ \partial \mathbf{A^{\ast}}} = 2 \mathbf{D}^{H} \left( \mathbf{D} \mathbf{A} – \mathbf{X} \right) $$

and

$$ \nabla_{\mathbf{A^{\ast}}} || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2} = 2 \frac{ \partial || \mathbf{X} – \mathbf{D} \mathbf{A} ||_{F}^{2}}{ \partial \mathbf{A}} = 2\mathbf{D}^{T} \left(\mathbf{D} \mathbf{A}- \mathbf{X} \right)^{\ast}$$

Best Answer

$ \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} \def\p{{\partial}} \def\grad#1#2{\frac{\p #1}{\p #2}} $For typing convenience, introduce the matrix variable $$\eqalign{ B &= DA-X \\ }$$ and the Frobenius product notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m \sum_{j=1}^n A_{ij} B_{ij} \;=\; {\rm Tr}(AB^T) \\ B^*:B &= \big\|B\big\|^2_F \\ }$$ Then the calculations for the Wirtinger gradients are $$\eqalign{ \phi &= B^*:B \\ d\phi &= B^*:dB &= B^*:D\,\,dA \;=\; D^TB^*:dA \\ \grad{\phi}{A} &= D^TB^* &= D^T(DA-X)^* \\ \grad{\phi}{A^*} &= \left(\grad{\phi}{A}\right)^* &= D^H(DA-X) \\ }$$ But if the matrices are real then $$\eqalign{ \phi &= B:B \\ d\phi &= 2B:dB &= 2D^TB:dA \\ \grad{\phi}{A} &= 2D^TB &= 2D^T(DA-X) \\ }$$ and the factor of two makes its appearance.


Setting the gradient to zero and solving for the optimal $A$ yields $$\eqalign{ D^HDA &= D^HX \quad\implies\quad A &= (D^HD)^{-1}D^HX \;\doteq\; D^+X }$$ which is identical to the least-squares solution of $\;DA=X$.