[Math] gradient of trace$(ABA^TC)$ w.r.t a Matrix A.

derivativeslinear algebravector analysis

With n-order Matrix A,B,C.I was trying to find $ \nabla_A trace(ABA^TC)$

This answer:Proof for the funky trace derivative : $d (\operatorname{trace} (ABA'C))$?

suggested:
$$
\nabla_A \operatorname{trace}( ABA^{T}C ) = CAB + C^T AB^T
$$

with a implication that
$$\nabla_A AB = B^T$$

can somebody show me why?

I also have my own proof based on the clue(using the chain rule) from that link.

First, let
$$
H(X,Y) = trace(XY^TC) \qquad\qquad (1)
\\
f(A) = AB \qquad\qquad (2)
\\
g(A) = trace(ABA^TC) \qquad\qquad(3)
$$
$g(A)$ can be rewritten as:
$$
g(A) = H(f(A),A)$$
we know the chain rule:
$$
\nabla_A g(A) = \nabla_XH(X,Y)\cdot \nabla_Af(A)+\nabla_YH(X,Y)\cdot \nabla_AA
$$
to simplify this equation, we need:
$$
\nabla_A trace(AB) = B^T \qquad\qquad (4)\\
trace(AB) = trace(BA) \qquad\qquad (5)\\
\nabla_{A^T}f(A) = [\nabla_Af(A)]^T \qquad\qquad(6)
$$
with (4), the first term
$$
\nabla_XH(X,Y)\cdot \nabla_Af(A)
$$
can be write as:
$$
\nabla_X trace(XY^TC) \cdot \nabla_A f(A) = C^TY \cdot \nabla_A AB = C^TA \cdot \nabla_A AB
$$
and with(5)
the second term can be write as:
$$
\nabla_YH(X,Y)\cdot = \nabla_Y trace(XY^TC) \cdot \\
=\nabla_Y trace(Y^TCX)
$$
with (6):
$$
\nabla_Y trace(Y^TCX) = [\nabla_{Y^T} trace(Y^TCX)]^T
$$
with (4):
$$
[\nabla_{Y^T} trace(Y^TCX)]^T = CX = CAB
$$
now I get
$$
\nabla_A ( ABA^{T}C ) = C^T A \cdot \nabla_A AB + CAB
$$

but i'm not sure that $\nabla_A AB = B^T$, can somebody show me why? or give my another proof?

Thank you for your honest suggestions!

Best Answer

The problem is much easier if you use the Frobenius Inner Product instead of the trace.

Write the objective function and find its differential $$\eqalign{ f &= {\rm tr}(ABA^TC) \cr &= I:ABA^TC \cr\cr df &= I:(dA)BA^TC + I:AB(dA^T)C \cr &= C^TAB^T:dA + B^TA^TC^T:dA^T \cr &= C^TAB^T:dA + CAB:dA \cr &= (C^TAB^T + CAB):dA \cr }$$where some of the expressions were rearranged using these mixed product rules $$\eqalign{ {\rm tr}(A^TBC) &= A:BC \cr &= AC^T:B \cr &= B^TA:C \cr &= A^T:(BC)^T \cr }$$which are derived from the cyclic property of the trace function.

Anyway, since $df=\big(\frac{\partial f}{\partial A}:dA\big),\,$ the gradient of the function must be $$\eqalign{ \frac{\partial f}{\partial A} &= C^TAB^T + CAB \cr }$$