[Math] gradient of product of matrices

calculuslinear algebramatricesmatrix equationsmatrix-calculus

I am trying to understand the gradient of matrix product but cannot seem to get to derive it.
here : https://web.stanford.edu/~jduchi/projects/matrix_prop.pdf (funky trace derivative) , i don't understand why gradient AB wrt to A equals B transpose. Also why is this result same as gradient of trace (AB) wrt A equals B transpose as well.

Can some one please explain on how to derive gradient of matrix product and what is the appropriate dimension for this gradient. The trace being a scalar value and gradient dimension being the dimension of transpose of B makes sense to me since it should be the same as dimension of A. But I cant seem to understand how to get gradient of product of matrices and the dimension.

Best Answer

Suppose $f(A)=\operatorname{tr} (AB)$, then $f(A+H)-F(A) = \operatorname{tr} (HB)$, so we have $Df(A)(H) = \operatorname{tr} (HB)$. (Not surprisingly, since trace is linear.)

In a Hilbert space, the gradient of a functional is an element $\nabla f(A)$ such that $Df(A)(H) = \langle \nabla f(A), H \rangle$ for all $H$.

Since $\langle X, Y \rangle = \operatorname{tr} (X^T Y)$, we see that $\nabla f(A) = B^T$.

This is entirely analogous to a function $g : \mathbb{R}^n \to \mathbb{R}$. The derivative is usually written as a row vector while the gradient is a column vector.

Addendum:

Let $f(A) = \operatorname{tr} (A B A^T C)$. Then we have $f(A+H)-f(A) = \operatorname{tr} (H B A^T C)+\operatorname{tr} (A B H^T C)+\operatorname{tr} (H B H^T C)$. The last term is of order $O(\|H\|^2)$, so we see that $Df(A)(H) = \operatorname{tr} (H B A^T C)+\operatorname{tr} (A B H^T C) $.

The relevant properties of trace are that (i) transpose invariance $\operatorname{tr} X = \operatorname{tr} X^T$ and (ii) shift invariance $\operatorname{tr} (X_1 ... X_n) = \operatorname{tr} (X_2...X_n X_1)$.

Applying these gives \begin{eqnarray} Df(A)(H) &=& \operatorname{tr} ((C^T A B^T)^T H)+\operatorname{tr} ((CAB)^TH) \\ &=& \langle C^T A B^T + CAB, H \rangle \end{eqnarray} from which we get the gradient to be $\nabla f(A) = C^T A B^T + CAB$.

Related Question