Use Chain rule rigorously with correct notation.

chain rulemultivariable-calculusproof-explanation

In Kolk's Multidimensional Real Analysis I: Differentiation

He gave the Chain rule as follows:

enter image description here

Then he proved the following example:

enter image description here

What confused me is the second equality in the above Example 2.4.5, which is underlined by a yellow line. What exactly is this?

Here is my understanding: I understand that it is a "compact" style of writing a function. I mean, for example, given any point $a$ in $\mathbf{R}$. The author tries to convey that (using the chain rule)$D(g \circ f)(a)= (D_1g(f(a)), \cdots, D_ng(f(a)) \circ \left(
\begin{matrix}
Df_1(a) \\ \vdots \\ Df_n(a)
\end{matrix}
\right)$
. Then we can use legitimate matrix multiplication to get the final result.

I think it's not rigorous:

  1. He writes $(Dg\circ f)$ as $((D_1g,\cdots,D_ng)\circ f)$. However, each $D_ig$ is a function from $\mathbf{R}^n$ to $\mathbf{R}$ and he just wrote them in a row vector, pretending they have accepted $f$'s output as input and have output a scalar in $\mathbf{R}$. I've never seen expressions like this before. Besides, how to prove $Dg=(D_1g,\cdots,D_ng)$? Is the total derivative equal to some undefined list of partial derivatives?

  2. The same expression appeared in $\left(
    \begin{matrix}
    Df_1 \\ \vdots \\ Df_n
    \end{matrix}
    \right)$

  3. I think if someone doesn't know in advance what the author tries to convey. These are all invalid expressions. So how should a beginner like me use these expressions correctly to prove more complicated statements?

Best Answer

The author implicitly identified derivative at a point $a$ as its matrix representation.

In fact, in example 2.4.5, the author emphasized this identification:

the derivative $D(g \circ f): \mathbf{R} \rightarrow End(\mathbf{R}) \simeq \mathbf{R} $

So if it was me, I would write the equation in Example 2.4.5 as follows:

for any $a \in \mathbf{R}$,

\begin{align} [D(g \circ f)(a)]_{1 \times 1} &= [Dg(f(a))]_{1 \times n} \circ [Df(a)]_{n \times 1}\tag{Jacobi matrix notation} \\ &=[D_1g(f(a)), \cdots, D_ng(f(a))]_{1 \times n} \circ \left( \begin{matrix} Df_1(a) \\ \vdots \\ Df_n(a) \end{matrix} \right)_{n \times 1} \\ &= \sum_{1\leq i\leq n} (D_ig(f(a))) \cdot Df_i(a) \end{align}

I don't think the "compact style" of writing an equation is good for learning.

Related Question