I am looking at the matrix cookbook here which says
$$\frac{\partial \text{Tr}(f(X))}{\partial X} = f'(X)^T$$
where $f'$ is just the derivative of the scalar function $f$. This is from Section 2.5. How does one prove this?
linear algebramatricesmatrix-calculus
I am looking at the matrix cookbook here which says
$$\frac{\partial \text{Tr}(f(X))}{\partial X} = f'(X)^T$$
where $f'$ is just the derivative of the scalar function $f$. This is from Section 2.5. How does one prove this?
Best Answer
$\DeclareMathOperator{\trace}{\text{Tr}}$
By chain rule for the Frechet derivative,
$$ d(\trace(f(X))[H] = (d\trace)(f(X))[ df(X) [H] ]$$ Since $(d\trace)(Y)[V] = \trace V$ for any $Y$,
$$ d(\trace(f(X))[H] = \trace( df(X) [H] )$$
Recall that if you usually write vectors as columns, the matrix representation of a linear map $\mathbb R^n \to \mathbb R$ should be a row vector; thus if $\nabla u(x)$ is a column vector, we have $d u(x) h = \nabla u(x)^T h$, i.e. the matrix representation of $du(x)$ is $\nabla u(x)^T$. The "same" is true for functions with matrix input: the matrix representative of $dU(X)$ is $\left(\frac{\partial U}{\partial X}(X)\right)^T$, with $$ dU(X)[H] = \left(\frac{\partial U}{\partial X}(X)\right)^T : H \overset{\Delta}{=} \trace \left(\left(\frac{\partial U}{\partial X}(X)\right) H\right)$$ Here, the double contraction $A:B \overset{\Delta}{=} \trace (A^T B) = A_{ij}B_{ij} $ was introduced. Thus, $$ \left(\frac{\partial }{\partial X}\trace(f(X))\right)^T : H = \trace \left(\left(\frac{\partial f}{\partial X}(X)\right)^T H \right) =\frac{\partial f}{\partial X}(X) : H $$
Comparing LHS and RHS gives the result.