I was reading on Wikipedia about total derivatives of functions and they stated the following about the chain rule for total derivatives:

Let $f:\mathbb R^m\to \mathbb R^k$ and $g:\mathbb R^n \to \mathbb R^m$ be two differentiable functions and let $a \in \mathbb R^n$. Let $D_{g(a)}f$ denote the total derivative of $f$ at $g(a)$ and $D_a g$ denote the total derivative of $g$ at a. Then: $$D_a(f\circ g)=D_{g(a)}f\circ D_a g$$

or, for short: $$D(f\circ g)=Df\circ Dg$$

The thing I'm not understanding is the following: What does $Df\circ Dg$ mean?

Those two total derivatives are defined as functions: $Df: \mathbb R^m\to \cal L(\mathbb R^m,\mathbb R^k)$, and $Dg: \mathbb R^n\to \cal L(\mathbb R^n,\mathbb R^m)$

So how is the composition $D(f\circ g)=Df\circ Dg$ defined? Am I missing something or is this a typo?

## Best Answer

I think you missed something here. Emphasis mine:

If we just take a total derivative of a function, yes, you're right. But we're not doing that here. We are talking the total derivative of functions

at specific points. And while it is true that $Df:\Bbb R^m\to \mathcal L(\Bbb R^m, \Bbb R^k)$, if we insert some $b\in\Bbb R^m$, we get $D_bf\in\mathcal L(\Bbb R^m, \Bbb R^k)$. And similar for $D_ag$.And then they make the subscripted points implicit so that they don't have to type as much and we don't have to read as much. Which yes, is an abuse of notation, and it is ambiguous, as your question is evidence for. But they do warn you by saying "for short". Presumably they won't use $D$ to mean the full "everywhere" total derivative any more, and only use it as the total derivative at some implicit point, whether arbitrary or given.