[Math] How to understand the gradient and Jacobian of a function $\vec{f}:\mathbb{R}^m\to\mathbb{R}^n$

calculusreal-analysis

How to understand the gradient of a function $\vec{f}:\mathbb{R}^m\to\mathbb{R}^n$? I am totally comfortable with the gradient $\nabla f$ where $f:\mathbb{R}^m \to \mathbb{R}$. The output is just a vector in $\mathbb{R}^m$.

However suppose we have a function $\vec{f}:\mathbb{R}^m \to \mathbb{R}^n$. The definition I have defines the gradient of $\vec{f}$ as $\nabla \vec{f}(\vec{x}) =(\nabla f_1 (\vec{x})……\nabla f_n (\vec{x}))$. I have some difficulties understanding this. Apparently $\nabla \vec{f}(\vec{x})$ can be written in matrix form and it should represent a linear transformation from $\mathbb{R}^m\to\mathbb{R}^n$, but what exactly does it represent?

My second question is, the author also defines the Jacobian matrix of $f$ as $(\nabla \vec{f}(\vec{x}))^T$, and denote it as $\frac{\partial \vec{f}}{\partial \vec{x}}$ I understand the Jacobian in $\mathbb{R}^m\to\mathbb{R}^n$ case, but how should I understand this?

Best Answer

$\require{amsmath} $ This can require a longish answer, depending on where we're starting from - or how much you want! I hope you are familiar with matrices, vectors - in a slightly abstract form - and linear transformations.

Depending on the context, there is at least one 'a priori' definition for the gradient $\nabla f$ of $f\colon \mathbb R^n \rightarrow \mathbb R$, other than the vector of partial derivatives $\partial f/ \partial x_i$, although any defintion should match up with that in this context. Suppose we have a nice path

$$ \gamma \colon \mathbb R \rightarrow \mathbb R^n.$$

Then, $f\circ \gamma \colon \mathbb R \rightarrow \mathbb R$ is a calculus one style function, and if $\gamma ( 0 ) = p \in \mathbb R^n$, $$ (f\circ \gamma )' ( 0 ) = \sum {\partial f \over \partial x_i} (p) \ \gamma_i'(0). $$ In one interpretation - the one you have in mind, I think - and the usual one when one thinks 'gradient'- the sum on the right is a dot product between the gradient vector $\nabla f|_p$ and the vector $\gamma'(0)$: $$ (f\circ \gamma )'\, ( 0 ) = \nabla f|_p \cdot \gamma'(0).\tag{*}$$ Another interpretation is to write the sum as a matrix multiplication: $$\left({\partial f \over \partial x_1} (p), \cdots, {\partial f \over \partial x_n} (p) \right) \left(\matrix{ \gamma_1'(0) \\ \vdots \\\gamma_n'(0) }\right), $$ i.e., to think of the row matrix of partials of $f$ - let's denote it $f'(p)$ - applied to the (column) vector $\gamma'(0)$: $$ (f\circ \gamma )' \,( 0 ) = f'(p) \ \gamma'(0). \tag{**}$$ On the one hand, with this formulation, we are no longer thinking of the collection of partials as a vector (the gradient $\nabla f|_p$), but as a linear transformation, $f'((p)$, applied to a tangent vector $\gamma'(0)$ at $p$, with image, another tangent vector $ (f\circ \gamma )' \,( 0 ) $ at $(f\circ \gamma )\,(0) = f(p)$.

On the other, this formulation suggests "chain rule" - does it not?

With that in mind, if $ f \colon \mathbb R^n \rightarrow \mathbb R^m$, the above suggests that the matrix, let's call it $f'(p)$, $$ \pmatrix { {\partial f_1\over \partial x_1}(p) &\cdots &{\partial f_1\over \partial x_n}(p)\cr \vdots & & \vdots \cr {\partial f_m\over \partial x_1}(p) &\cdots &{\partial f_m\over \partial x_n}(p) } $$ is a linear transformation, taking tangent vectors at $p$ to tangent vectors at $q=f(p)$. Namely, $t \mapsto f\circ \gamma (t)$ is a curve in $\mathbb R^m$, with tangent vector at $q$, $$( f\circ\gamma )'\,(0) = f'(p)\ \gamma'(0),$$ where the multiplication on the right is matrix multiplication. To keep to the calculus one formulation one shyould also consider a function $g \colon \mathbb R^m \rightarrow \mathbb R$: then $t \mapsto (g\circ f \circ \gamma) (t) $ is an ordinary real-valued function, and has a 'cal one' style derivative at $t=0$, but one calculates it - chain rule style in higher dimensions - by: $$ (g\circ f \circ \gamma)'\, (0) = g'(q)\ f'(p)\ \gamma'(0),$$ where the multiplication on the RHS is matrix multiplication.

To return more explicitly to your question - I believe that it is somewhat unusual to use gradient notation in the context of the Jacobian matrix: 'usually' gradient means there is a 'dot product' type of context...

To illustrate: In $\mathbb R^3$, as you know, if $f\colon \mathbb R^3 \to \mathbb R $ is a nice map, and $f(p) = b$, the normal (dot product = 0) of the tangent space at $p$ of the level surface $f(x) =b$ is $\nabla f|_p$, and, by the by, measures the direction and magnitude of maximal change - a vector, i.e., the gradient, of course.

[Also (more generally?), differential geometry often comes with dot-product structures on tangent spaces - e.g., one can say that a tangent plane of a sphere inherits a dot product from the one in the ambient $\mathbb R^3$ - and one can talk of the gradient of a function $f$ at $p$ as the tangent vector $({\rm grad} f)(p)$ on some geometric space (e.g., the surface of a sphere $S$) which satisfies $({\rm grad} f)(p) \cdot v = f'(p) v$, for all tangent vectors $v$ to $S$ at $p$, and where the $\cdot$ is the dot product, and where I am understanding $f'(p)$ as I have used it in this answer - as a derivative/linear map.]

This answer uses the $f'$ notation to emphasize the chain rule... But, depending on context, the (various?) $f'(p)$ of this answer is (are?) also often denoted $ df_p$, $\partial f/ \partial x|_p$, or $Df_p$, or etc... For instance, in this question $\gamma' (0)$ showed up as tangent vector - but drinking my own Kool Aid, one could equally well think of it as a linear map from the tangent space of $\mathbb R$ at $t = 0$...

This happens in cal one: when we write $f'(0)$, does one mean a slope of the tangent line - i.e. as a matrix, a linear map - or a number (rate of change)?

In any event, the identification of gradient with matrix is the identification of the right hand sides of equations $(*)$ and $(**)$.

As you see there is tower of Babel of identification going on, and to be clear about; one could 'choose' non-conflicting notation, but there is a lot of history here, and one shouldn't be arthritic...

Hope this helps.