[Math] How to show Jacobian of a composite function is the product of Jacobians

derivativesjacobianlinear-transformations

Let $f: \mathbb{R}^m \rightarrow \mathbb{R^n}$ and $g : \mathbb{R^n} \rightarrow \mathbb{R^m}$ be two vector-valued functions. We want to show that

$$J_{f\circ g}(a)=J_f(g(a))Jg(a)$$

where $J$ is the Jacobian and $a$ is a point in $\mathbb{R^n}$.

Wikipedia Chain Rule page, at Higher dimensions section, has the following:

Let $D_a(g)$ denote the total derivative of $g$ at $a$ and $D_{g(a)}(f)$ denote the total derivative of $f$ at $g(a)$. These two derivatives are linear transformations. The chain rule for total derivatives says that their composite is the total derivative of $f \circ g$ at $a$, that is:

$$
D_a(f \circ g) = D_{g(a)}(f) \circ D_a(g) \tag{1}
$$

Concluding by the fact that the total derivative is Jacobian and since it is linear transformation, the composite of two total derivative becomes product of them.

My questions:

1- Why $(1)$ is true? prove this?

2- Alternative prove that start from scratch and shows $J_{f\circ g}(a)=J_f(g(a))Jg(a)$.

Best Answer

One idea is as follows. If $f$ and $g$ are differentiable, then by definition it exists a best linear approximation which is:

$$ g(x+h)=g(x)+dg[x](h)+o(\|h\|) $$

where $dg[x](h)$ denotes the action of the linear application $dg[x]$ on the vector $h$.

The notation $o(\|h\|)$ means that $$\lim\limits_{h\to 0} \frac{o(\|h\|)}{\|h\|}=\mathbf{0}_{\mathbb{R}^m}$$ Its interpretation is that for any constant $C$, $o(\|h\|)$ tends faster than $C.h$ to the zero vector, hence for small $h$ our linear approximation $dg[x].h$ is not "perturbed" by this reminder.

We can write the same thing for $f$: $$ f(y+k)=f(y)+df[y].k+o(\|k\|) $$

Now observe that:

\begin{align} f(g(x+h)) &=f(\overbrace{g(x)}^y+\overbrace{dg[x].h+o(\|h\|)}^k)) \\ &=f(g(x))+df[g(x)](dg[x].h+o(\|h\|))+o(\|dg(x).h+o(\|h\|)\|) \end{align}

Now from the definition of $o(.)$, it is not too hard to see that: $$ o(\|dg(x).h+o(\|h\|)\|)=o(\|h\|) $$ and (by linearity) = \begin{align} df[g(x)](dg[x].h+o(\|h\|)) &=df[g(x)](dg[x](h))+df[g(x)](o(\|h\|)) \\ &=df[g(x)](dg[x](h))+o(\|h\|) \end{align}

We finally get $$ f(g(x+h))=f(g(x))+df[g(x)](dg[x](h))+o(\|h\|) $$ and by identification $$d(f\circ g)[x]=df[g(x)]dg[x]$$ which is the expected result

Related Question