$X$ is a matrix, prove $\frac{\partial \text{Tr}(f(X))}{\partial X} = f'(X)^T$

linear algebramatricesmatrix-calculus

I am looking at the matrix cookbook here which says

$$\frac{\partial \text{Tr}(f(X))}{\partial X} = f'(X)^T$$

where $f'$ is just the derivative of the scalar function $f$. This is from Section 2.5. How does one prove this?

Best Answer

$\DeclareMathOperator{\trace}{\text{Tr}}$

By chain rule for the Frechet derivative,

$$ d(\trace(f(X))[H] = (d\trace)(f(X))[ df(X) [H] ]$$ Since $(d\trace)(Y)[V] = \trace V$ for any $Y$,

$$ d(\trace(f(X))[H] = \trace( df(X) [H] )$$

Recall that if you usually write vectors as columns, the matrix representation of a linear map $\mathbb R^n \to \mathbb R$ should be a row vector; thus if $\nabla u(x)$ is a column vector, we have $d u(x) h = \nabla u(x)^T h$, i.e. the matrix representation of $du(x)$ is $\nabla u(x)^T$. The "same" is true for functions with matrix input: the matrix representative of $dU(X)$ is $\left(\frac{\partial U}{\partial X}(X)\right)^T$, with $$ dU(X)[H] = \left(\frac{\partial U}{\partial X}(X)\right)^T : H \overset{\Delta}{=} \trace \left(\left(\frac{\partial U}{\partial X}(X)\right) H\right)$$ Here, the double contraction $A:B \overset{\Delta}{=} \trace (A^T B) = A_{ij}B_{ij} $ was introduced. Thus, $$ \left(\frac{\partial }{\partial X}\trace(f(X))\right)^T : H = \trace \left(\left(\frac{\partial f}{\partial X}(X)\right)^T H \right) =\frac{\partial f}{\partial X}(X) : H $$

Comparing LHS and RHS gives the result.

Related Question