Multivariable Calculus – Applying Chain Rule for Directional Derivatives in Various Functions

derivativesjacobianmultivalued-functionsmultivariable-calculuspartial derivative

Given $f:\mathbb{R}^m\to \mathbb{R}^n$, $g:\mathbb{R}^n\to\mathbb{R}^q$, is the following statement for the directional derivative ($\mathbf{v}\in\mathbb{R}^m$) correct?
$$\partial_{\mathbf{v}}(\mathbf{g}\circ\mathbf{f})=\big((D\mathbf{g})\circ \mathbf{f}\big)\partial_{\mathbf{v}}\mathbf{f},$$
where $\partial_{\mathbf{v}}$ denotes the directional derivative and $D\mathbf{g}$ the Jacobian matrix.

Best Answer

As long as you interpret the RHS correctly regarding where the point of evaluation goes, then yes, it is correct. If we explicitly mention the point of evaluation, and we are super pedantic about the order of evaluation, then for every $\xi \in \mathbb{R}^m$, we have

\begin{align} \left(\partial_v(g \circ f) \right)(\xi) &= \left[ \left((Dg)\circ f \right)(\xi) \right] \cdot \left((\partial_vf)(\xi) \right) \\ &= \left[ Dg(f(\xi)) \right] \cdot \left((\partial_vf)(\xi) \right) \end{align}

where the $\cdot$ means matrix multiplication.


The reason I used the word "interpret" above is because from a purely technical standpoint, when you leave out the variable $\xi$, you need to ensure that both sides of the equation has functions with the same domain and target space. In this example, we have the following domains and target spaces:

  • $\partial_v(g \circ f) : \mathbb{R}^m \to \mathbb{R}^q$
  • $(Dg) \circ f : \mathbb{R}^m \to M_{q \times n}(\mathbb{R})$
  • $\partial_vf : \mathbb{R}^m \to \mathbb{R}^n$

So, strictly speaking the RHS of your equation is not defined properly. If you wanted to be super formal and write the equation above in a correct form, without explicitly mentioning the variable $\xi$, then we would have to introduce the following "auxillary" functions:

  • $\omega : M_{q \times n}(\mathbb{R}) \times \mathbb{R}^n \to \mathbb{R}^q$, defined by $\omega(A, \eta) = A \cdot \eta$.
  • $\iota_1 : M_{q \times n}(\mathbb{R}) \to M_{q \times n}(\mathbb{R}) \times \mathbb{R}^n$, defined by $\iota_1(A) = (A,0)$.
  • $\iota_2 : \mathbb{R}^n \to M_{q \times n}(\mathbb{R}) \times \mathbb{R}^n$, defined by $\iota_2(\eta) = (0,\eta)$.

Here, $\omega$ is a sort of "evaluation map", which evaluates the matrix $A$ on the vector $\eta$ by multiplication. $\iota_1$ and $\iota_2$ are the "canonical injections", which allow us to think of an element as being part of a larger product space. With this, we can the write the precise (but cumbersome) statement:

\begin{equation} \partial_v(g \circ f) = \omega \circ \left( \iota_1 \circ (Dg) \circ f + \iota_2 \circ \partial_v f \right) \end{equation}

On the RHS, you can see that both $\iota_1 \circ (Dg) \circ f$ and $\iota_2 \circ \partial_v f$ are functions from $\mathbb{R}^m$ into $M_{q \times n}(\mathbb{R}) \times \mathbb{R}^n$, so their sum is a function of the same kind. Thus, composing this sum with $\omega$ makes sense and the result is a function from $\mathbb{R}^m$ into $\mathbb{R}^q$; this agrees with the LHS.


Final Remarks:

  • Comparing the two highlighted blocks of equations, you can see that it is easier to formulate this directional derivative chain rule in a pointwise manner by including $\xi$ everywhere. I only brought up this technical detail for your awareness, although in practice, people do abuse notation and write it in exactly the form you have written it, and mentally keep track of/deduce from context how things are to be evaluated.
  • One last nitpick :) in your title, "multi-valued" functions is incorrect terminology, because the function is single-valued. What you meant to say was something like "multivariate function with values in $\mathbb{R}^q$"
Related Question