The directional derivative of a scalar valued function with respect to a matrix

derivativesmatrix-calculus

According to Wikipedia (under "Scalar-by-Matrix), the directional derivative of a function $f(\mathbf{X}) \in \mathbb{R}$ in the direction of $\mathbf{Y}$ is given by

$$D_{\mathbf{Y}}f(\mathbf{X}) = tr\left(\frac{\partial f}{\partial\mathbf{X}}\mathbf{Y}\right)$$

where $\mathbf{X}$ and $\mathbf{Y}$ are matrices.

Can someone point me to a derivation of this fact?

Best Answer

Perhaps it would be more intuitive to start with the total differential of the function $$\eqalign{ df &= \sum_{i}\sum_{j} \frac{\partial f}{\partial X_{ij}} \; dX_{ij} }$$ Then to obtain the directional derivative, set $\,dX=Y\,$ and note that the trace operation itself can be $\underline{\tt defined}$ as a double summation, i.e. $${\rm Tr}(A^TB) = \sum_{i}\sum_{j} A_{ij}B_{ij}$$