Computing the Jacobian $J_F$ with $F = h \circ f$

calculusderivativesjacobianmatrix-calculus

Let $f : \mathbb{R}^l \rightarrow{} \mathbb{R}^m$ and $h : \mathbb{R}^m \rightarrow{} \mathbb{R}^o$

and let $F = h \circ f$ with $F : \mathbb{R}^l \rightarrow{} \mathbb{R}^o$

I want to compute the Jacobian using Forward mode accumulation in one path.

I do understand the automatic differentiation way of working in the forward mode for simple cases.

So I think based on my knowledge if I want to compute $J_F$ using as many paths as I want I could do $l*m$ paths. Since I am constrained with only one path, I start to get confused. I know I can do something with the initialization but it's very confused in my mind. Could you please help me to understand how to implement the forward mode just in one path using the Jacobian of $f$ and $h$ along the way?

Thanks

Best Answer

It's just the matrix multiplication of the two Jacobians. In fact, this can be seen as the reason why matrix multiplication is defined like that. If you see a $m\times n$ matrix as a linear function from $\mathbb R^n \to \mathbb R^m$, then the Jacobian of the matrix is the matrix itself. The matrix multiplication is just the composition of the two linear functions, so the Jacobian of the composed function is just the matrix multiplication of the two Jacobians.

Locally (that is, in a infinitesimal neighborhood), all differentiable functions can be seen as a linear function of the derivatives of the inputs, therefore you can use matrices to describe the local behavior of all multivariable differentiable functions.


Let's write the components of the function $f$ as $f^\mu$ where $\mu$ is an index that goes from $1$ to $m$. Similarly, you can write $g$ as $g^\mu$. In this way, can write a vector as a symbol with indices but treat them as numbers, (as in, $f^1$ is the first element of the result of $f(x)$, which is in $\mathbb R$.

Then you can write the partial derivative $\frac{\partial f^\mu}{\partial x^\nu}$ as $f^\mu{}_{,\nu}$, mind the comma. It is easy to see that the $f^\mu{}_{,\nu}$ is exactly the Jacobian of $f$, as in $f^1{}_{,2} \equiv \partial f^1/\partial x_2$, is the $(1,2)$-th element of the Jacobian. These notations are called the Ricci Calculus.

The chain rule says: $$ \left(\frac{\partial}{\partial x^\nu}g(f(x))\right)^\mu=(g\circ f)^\mu{}_{,\nu}=\sum_\gamma g^\mu{}_{,\gamma}f^\gamma{}_{,\nu}\equiv g^\mu{}_{,\gamma}f^\gamma{}_{,\nu} $$

The last one is to simplify the notation of the summation using Einstein notation, invented by Einstein for manipulating tensors, which is a part of Ricci Calculus as well.

You can see that $\mu$ goes from $1$ to $o$, $\gamma$ goes from $1$ to $m$, and $\nu$ goes from $1$ to $l$. This immediately gives the matrix multiplication of the two Jacobians.

Related Question