Given the matrices $A \in \mathbb{R}^{n \times n}$ and $B \in \mathbb{R}^{m \times m}$, let the scalar field $f : \mathbb{R}^{m \times n} \to \mathbb{R}$ be defined by
$$ f(C) := \frac{1}{2}\left\lVert CA – BC \right\rVert_F^2 $$
What is the gradient $\nabla f$?
I am trying to differentiate this function w.r.t. to $C$ but I cannot find a way to manipulate the expression that would enable me to do so. I've also tried a definition of derivative adapted in this case but I don't endup with something useful at first glance. I endup with a linear map $df(C)$ defined by the expression
$$
df(C)E = \text{trace} \left\{ (CA -BC)^T (EA – BE)\right\} = \left\langle CA -BC,EA-BE\right\rangle
$$
which then leads to me to
$$
df(C) = \left\langle AA^TC^T – AC^TB^T – A^TC^TB + C^TB^TB, \cdot \right\rangle
$$
Is this expression correct?
Best Answer
Let
$$ f({\bf X}) := \frac12 \left\| {\bf X} {\bf A} - {\bf B} {\bf X} \right\|_{\text{F}}^2 $$
Using the definition of the Frobenius norm and the cyclic property of the trace,
$$ \nabla_{{\bf X}} f({\bf X}) = \cdots = \color{blue}{({\bf X} {\bf A} - {\bf B} {\bf X}) {\bf A}^\top - {\bf B}^\top ({\bf X} {\bf A} - {\bf B} {\bf X})} $$
Addendum
Suppose that we would like to find where the gradient vanishes. We then have the following linear matrix equation.
$$ ({\bf X} {\bf A} - {\bf B} {\bf X}) {\bf A}^\top - {\bf B}^\top ({\bf X} {\bf A} - {\bf B} {\bf X}) = {\bf O}_{m \times n} $$
Vectorizing both sides, we obtain the following homogeneous linear system
$$ \left( \left( {\bf A} {\bf A}^\top \otimes {\bf I}_m \right) - \left( {\bf A} \otimes {\bf B} \right) - \left( {\bf A} \otimes {\bf B} \right)^\top + \left( {\bf I}_n \otimes {\bf B}^\top {\bf B}\right) \right) \operatorname{vec} ({\bf X}) = {\bf 0}_{mn} $$
Related
Gradient of squared Frobenius norm of a matrix
Gradient of $A \mapsto \operatorname{trace} (A B A' C)$