Jacobian matrix of function and bases

linear algebralinear-transformationsmultivariable-calculusreal-analysis

Let $V,W$ are finite-dimensional vector spaces over the field
$\mathbb{k}$ and assume that $\mathcal{A}:V\to W$ is a linear map. If
we fix bases $\{e_i\}_{i=1}^{m}$ in $V$ and $\{f_j\}_{j=1}^n$ in $W$,
then $\mathcal{A}$ is defined by a unique matrix $A$. Taking different
bases $\{e'_i\}_{i=1}^{m}$ in $V$ and $\{f'_j\}_{j=1}^n$ in $W$, we
obtain the different matrix $A'$. It is easy to find the relation
between matrices $A$ and $A'$.

Let's take a look at the definition of differentiable function with many variables.

Let $E\subset \mathbb{R}^m$ be an open set. We say that $f:E\to
\mathbb{R}^n$
is differentiable at $x\in E$ if there is linear map
$L(x):\mathbb{R}^m\to \mathbb{R}^n$ and a function
$\alpha:\mathbb{R}^m\to \mathbb{R}^n$ such that
$$f(x+h)-f(x)=L(x)h+\alpha(h)\lVert h\rVert$$ holds at some deleted
neightborhood of $0\in \mathbb{R}^m$ and $\lim\limits_{h\to
0}\alpha(h)=0$
. That linear map $L(x):\mathbb{R}^m\to \mathbb{R}^n$ is
called differential of $f$ at $x$ and is denoted as $df(x)$ or
$f'(x)$.

One can perform some computations and show that $df(x)$ is defined by the following matrix
$$\begin{pmatrix}
\partial_1f^1(x) & \partial_2f^1(x) & \dots & \partial_mf^1(x) \\
\vdots & \vdots & \ddots & \vdots \\
\partial_1f^n(x) & \partial_2f^n(x) & \dots & \partial_mf^n(x)
\end{pmatrix},$$

which is called Jacobian matrix.

I remember how this matrix was derived but I am asking myself the following

Question. For which bases in $\mathbb{R}^m$ and in $\mathbb{R}^n$ this matrix is derived? May be for standard bases in $\mathbb{R}^m$ and $\mathbb{R}^n$? I don't remember that this basis fact have been used in the computing of Jacobian matrix.

Thank you for your help!

Best Answer

In the standard derivation of the Jacobian, there is no need to consider the particular basis of the domain and codomain under consideration. So, the literal answer to "where do we use the fact that these bases are standard" is "nowhere".

I think that your implicit question is the following: given bases $\mathcal B_1 = \{v_1,\dots,v_m\}$ of $\Bbb R^m$ and $\mathcal B_2 = \{w_1,\dots,w_n\}$ of $\Bbb R^n$, how would we compute the matrix of $df(x)$ relative to these bases? With that, it is easy to say why it is that the standard basis yields the Jacobian matrix.

When discussing the entries of a matrix, it is convenient to introduce dual bases. We say that a set $\mathcal B_2^* = \{\beta_1,\dots,\beta_n\}$ of linear functions $\beta_i:\Bbb R^n \to \Bbb R$ is the dual basis to $\mathcal B_2$ if we have $$ \beta_i(w_j) = \begin{cases} 1 & i=j\\ 0 & i \neq j. \end{cases} $$ The role of the dual basis is to extract the individual components of a vector in $\Bbb R^n$ corresponding to the elements of $\mathcal B_2$. In other words, if $w = x_1 w_1 + \cdots + x_n w_n$, then $\beta_i(w) = x_i$ (for $i = 1,\dots,n$).

For a linear map $L:\Bbb R^m \to \Bbb R^n$, the $i,j$ entry of the matrix $[L]^{\mathcal B_1}_{\mathcal B_2}$ (of $L$ relative to the bases $\mathcal B_1, \mathcal B_2$) is given by $\beta_i(L(v_j)).$

Note that the directional derivative of $f$ along the vector $v$ is given by the limit $$ D_vf(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t} $$ Usually, when this concept is introduced $f$ is necessarily scalar valued and $v$ is necessarily a unit vector, but this need not hold in our case. From your definition, we are given that $$ f(x+h) - f(x) = df(x)h + o(\|h\|), $$ So the above limit becomes $$ D_vf(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t} = \lim_{t \to 0}\frac {tdf(x)(v) - o(t)}{t} = df(x)(v). $$ With that stated, the $i,j$ entry $m_{ij}$ of $M = [df(x)]^{\mathcal B_1}_{\mathcal B_2}$ is given by $$ m_{ij} = \beta_i(df(x)(v_j)) = \beta_i[D_{v_j}f(x)] $$ So, what's special about the standard bases? If we take $\mathcal B_1$ to be the standard basis, then the directional derivative $D_{v_j}f(x)$ is simply the partial derivative of $f$ with respect to $x_j$. That is, we have $$ m_{ij} = \beta_i[D_{v_j}f(x)] = \beta_i[\partial_j f]. $$ If we then take $\mathcal B_2$ to be the standard basis, then the dual basis $\mathcal B_2^*$ is simply the set of component functions $\beta_i(x_1,\dots,x_n) = \beta_i$. That is, we have $$ m_{ij} = \beta_i[\partial_j f] = \partial_j f^i. $$ So, when $\mathcal B_1,\mathcal B_2$ are the standard bases, then $[df(x)]_{\mathcal B_2}^{\mathcal B_1}$ is the Jacobian matrix of $f$.


An alternative approach: as we established above, the $i,j$ entry of the matrix $M = [df(x)]^{\mathcal B_1}_{\mathcal B_2}$ is given by $$ m_{ij} = \beta_i(df(x)(v_j)) = (\beta_i \circ df(x))(v_j). $$ In the case where $\mathcal B_2$ is the standard basis, the dual basis elements are simply the standard projection functions $\beta_i = \pi^i$ (as you correctly note in your comment below). I claim that $$ \pi^i \circ df(x) = df^i(x). $$ To show that this is the case, we can show that $df^i(x)$ satisfies the requirement from your definition: $$ (\pi^i \circ f)(x + h) - (\pi^i \circ f)(x) = f^i(x + h) - f^i(x) = df^i(x)(h) + o(\|h\|). $$ From there, we have $$ m_{ij} = (\beta_i \circ df(x))(v_j) = df^i(x)(v_j). $$ Now, we can note that for any scalar-valued function $g$, $dg(x)(v)$ is simply the directional derivative of $g$ along $v$. Indeed, we note that $$ D_vg(x) = \lim_{t \to 0} \frac{g(x + tv) - g(x)}{t} = \lim_{t \to 0}\frac {tdg(x)(v) + o(t)}{t} = dg(x)(v). $$ Now, if $\mathcal B_1 = \{e_1,\dots,e_n\}$ is the standard basis of $\Bbb R^n$, then we have \begin{align} m_{ij} &= df^i(x)(e_j) = D_{e_j}f^i = \partial_j f^i. \end{align}