Going from the differential to the derivative (Frechet and matrix calculus)

derivativesfrechet-derivativematrix-calculus

For a function $f: A \rightarrow B$, Frechet differentiability tells us that we want to find a linear operator that satisfies

$$\lim_{H\rightarrow 0} \frac{||f[X+H] – f[X] – G[H]||}{||H||} = 0$$

This would mean that $G$ is a good approximation of the change in $f$ at $X$ for some small $H\in A$. That is for the operator $df: A \rightarrow B$

$$df(X)[H] = G[H]$$

Wikipedia says that $G$ is defined as the Frechet derivative of $f$ at $X$. But I have a litte trouble connecting this to the traditional notion of the derivative, where we have a fraction i.e. something like $\frac{dy}{dx}$.


For example, consider a standard formula from the matrix cookbook

$$\frac{dTr(XA)}{dX} = A^T$$

The Frechet differentiability definition gets me up to

$$dTr(XA)[H] = G[H] = Tr(HA).$$

What is then done is
\begin{align}
dTr(XA)[H] &= Tr(HA) \\
&= A^T :H,
\end{align}

where we just use the notation $A:B = Tr(A^TB)$. What is the correct way to go from here to the conclusion that
$$\frac{dTr(XA)}{dX} = A^T$$

and what does the LHS even represent exactly since it's not a fraction in the traditional sense?


More generally, what is involved in going from the differential form i.e. $df(X)[H] = G[H]$ to the derivative form $G = \frac{df(X)}{dX}$?

Best Answer

If $f: A\to B$ is Frechet differentiable then for all $X,V\in A$ the directional derivative

$$d_Vf(X)=\lim_{t\to 0}\frac{f(X+tV)-f(X)}{t}$$

exists and $d_Vf(X)=df(X)(V)$.

Now if $A=\mathbb R^{n\times n}$, $B=\mathbb R$ and $G = \frac{df(X)}{dX}$ then $G_{i,j}$ is just the directional derivative $d_{H_{i,j}}f(X)$ where $H_{i,j}$ is the matrix which has a $1$ on the $(i,j)$-th position and is zero otherwise. So the relation is

$$G_{ij}=df(X)(H_{i,j})$$

$\bullet$ Now consider the special map $f:\mathbb R^{n\times n}\to\mathbb R$ $X\mapsto Tr(XA)$. As this map is already linear we have $df(X)=f$ for all $X\in\mathbb R^{n\times n}$ so aplying the above relation yields

$$G_{i,j}=df(X)(H_{i,j})=f(H_{i,j})=Tr(H_{i,j}A)=A_{j,i}$$

so $G=A^T$ which also can be computed directly.