[Math] How to compute the directional derivative of a vector field

matrix-calculuspartial derivativetensor-productsvector analysisVector Fields

Suppose we are given a vector field $\vec{a}$ such that

$$\vec{a}(x_1,\ldots,x_n)=\sum_{i=1}^{k}f_i(x_1,\ldots,x_n)\vec{e_i} $$

where

$$\mathbf{S}=\{\vec{e_1},\ldots,\vec{e_k}\}$$
is some constant, orthonormal basis of $\Bbb{R}^k$.

What follows is to be taken with a cellar of salt. To compute the directional derivative, we start with the gradient. Its components are given by the matrix $\mathbf{G}$:

$$\mathbf{G}=\begin{bmatrix}\frac{\partial f_1(x_1,\ldots,x_n)}{\partial x_1} & \cdots &\frac{\partial f_1(x_1,\ldots,x_n)}{\partial x_n}\\ \vdots & \ddots & \vdots\\\frac{\partial f_k(x_1,\ldots,x_n)}{\partial x_1}&\cdots&\frac{\partial f_k(x_1,\ldots,x_n)}{\partial x_n}\end{bmatrix}.$$

The gradient $\vec{\nabla}\vec{a}$ itself is given by the double sum

$$\vec{\nabla}\vec{a}=\sum_{i=1}^{k}\sum_{j=1}^{n}\frac{\partial f_i(x_1,\ldots,x_n)}{\partial x_j}\vec{e_i}\otimes\vec{e_j}.$$
When dealing with scalar-valued functions, the derivative in the direction of some vector $\vec{u}$ would be the projection of the gradient onto $\vec{u}$.

Assuming this still holds, the directional derivative $\mathrm{D}_{\vec{u}}(\vec{a})$ of $\vec{a}$ is

$$\mathrm{D}_{\vec{u}}(\vec{a})=\vec{\nabla}\vec{a}\cdot\frac{\vec{u}}{|\vec{u}|}.$$

Substituting in our double sum:

$$\mathrm{D}_{\vec{u}}(\vec{a})=\frac{\vec{u}}{|\vec{u}|}\sum_{i=1}^{k}\sum_{j=1}^{n}\frac{\partial f_i(x_1,\ldots,x_n)}{\partial x_j}\vec{e_i}\otimes\vec{e_j}.$$

Question: Is this generalisation for $\mathrm{D}_{\vec{u}}(\vec{a})$ true?

  • If so, how does one evaluate it?
  • If not, what is the proper way to find a directional derivative of a vector field?

Appendix

The sign $\otimes$ denotes the tensor product. Here, we have the tensor product of basis vectors.

Furthermore, following dyadics on Wikipidia, it seems for an orthonormal basis $$\mathrm{D}_{\vec{u}}(\vec{a})=\frac{\vec{u}}{|\vec{u}|}\mathbf{G}.$$ So if $\vec{u}=\vec{e_m}$, then $$\mathrm{D}_{\vec{e_m}}(\vec{a})=\vec{e_m}\mathbf{G}.$$ This makes no sense, unless it is some kind of tensor contraction… In such a case, $$\mathrm{D}_{\vec{e_m}}(\vec{a})=\begin{bmatrix}\sum_{i=1}^{k}e_iG_{i1}\\ \vdots \\ \sum_{i=1}^{k}e_iG_{in}\end{bmatrix}.$$

Here $e_i$ denotes the $i^{th}$ component of $\vec{e_m}$; $G_{ij}$ denotes the $ij^{th}$ component of $\mathbf{G}$. And since we are in an orthonormal basis, only $e_m=1\neq0$:

$$\mathrm{D}_{\vec{e_m}}(\vec{a})=\begin{bmatrix}e_mG_{m1}\\ \vdots \\ e_mG_{mn}\end{bmatrix}=\begin{bmatrix}G_{m1}\\ \vdots \\ G_{mn}\end{bmatrix}.$$

This seems to be the $m^{th}$ row of $\mathbf{G}$ transposed. And in derivative form,

$$\mathrm{D}_{\vec{e_m}}(\vec{a})=\begin{bmatrix}\frac{\partial f_m(x_1,\ldots,x_n)}{\partial x_1}\\ \vdots \\ \frac{\partial f_m(x_1,\ldots,x_n)}{\partial x_n}\end{bmatrix}.$$

Best Answer

Suppose the vector-valued function $\mathbf{f}:\mathbb{R}^n\rightarrow\mathbb{R}^m$ has the (total) derivative at $\mathbf{x_0}\in \mathbb{R}^n$ denoted by $\mathrm{d}_\mathbf{x_0}\mathbf{f}$. It is a linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$. It gives the (total) differential of the function $\mathbf{f}$ at $\mathbf{x_0}$ as a function mapping from $\mathbb{R}^n$ to $\mathbb{R}^m$ by applying to the vector variable $\mathbf{x}$ near $\mathbf{x_0}$ to give $\mathrm{d}_\mathbf{x_0}\mathbf{f}\left(\mathbf{x}-\mathbf{x_0}\right)$. With respect to standard basis sets $\left\{\mathbf{\hat{a}}_i\right\}_{i=1}^{n}$ and $\left\{\mathbf{\hat{b}}_i\right\}_{i=1}^{m}$ of $\mathbb{R}^n$ and $\mathbb{R}^m$, respectively, the total derivative $\mathrm{d}_\mathbf{x_0}\mathbf{f}$ corresponds to the $ m \times n$ matrix called the Jacobian matrix

$$\left(\mathrm{d}_\mathbf{x_0}\mathbf{f}\right)=\left(\begin{matrix}\frac{\partial f_1}{\partial x_1}&&\cdots&&\frac{\partial f_1}{\partial x_n}\\\vdots&&\ddots&&\vdots\\\frac{\partial f_m}{\partial x_1}&&\cdots&&\frac{\partial f_m}{\partial x_n}\end{matrix}\right),\\\mathbf{x_0}=x_i \mathbf{\hat{a}}_i,\mathbf{f}\left(\mathbf{x}\right)=f_i\left(\mathbf{x}\right)\mathbf{\hat{b}}_i$$

On the other hand, the gradient of $\mathbf{f}$ donoted by $\nabla\mathbf{f}$ is a linear transformation from $\mathbb{R}^m$ back to $\mathbb{R}^n$, defined, with respect to the same standard basis sets, so that the corresponding matrix of it is the $n \times m$ matrix

$$\left(\nabla\mathbf{f}\right)=\left(\begin{matrix}\frac{\partial f_1}{\partial x_1}&&\cdots&&\frac{\partial f_m}{\partial x_1}\\\vdots&&\ddots&&\vdots\\\frac{\partial f_1}{\partial x_n}&&\cdots&&\frac{\partial f_m}{\partial x_n}\end{matrix}\right)$$

Note, at least with respect to standard basis sets, that the gradient is the transpose of the total derivative.

The variation of function $\mathbf{f}$ at $\mathbf{x_0}$ in the direction of unit vector $\mathbf{u} \in \mathbb{R}^n$, i.e. the directional derivative of $\mathbf{f}$, denoted by $\mathrm{D}_\mathbf{u}\mathbf{f}\left(\mathbf{x_0}\right)$ is a vector in $\mathbb{R}^m$ given by applying the total derivative on $\mathbf{u}$,

$$\mathrm{D}_\mathbf{u}\mathbf{f}\left(\mathbf{x_0}\right)=\mathrm{d}_\mathbf{x_0}\mathbf{f}\mathbf{u}$$

In the special case of $m=1$, i.e. scalar valued function, the rhs of the equation above is a product of a $1\times n$ matrix and an $n\times1$ matrix, leading to a scalar. It happens that, in this particular case, the matrix product equals also to the dot product of the gradient of the function and the unit vector. That's why when talking about scalar functions the textbooks always link the gradient with the directional derivative by the dot product as rule of calculation. However, we cannot generalize directly the dot-product rule to vector valued functions.

About your appendix, if we use the tensor product of the base vectors of two vector spaces as the basis to express a linear transform between these two vector space, we must be careful about the dimensions. In fact in tensor analysis we already have a rigorous and general definition. But here suppose we redefined something only for the specific purpose describe by this question.

Let $\mathcal{V}_n$ denote an $n$-dimensional vector space on $\mathbb{R}$. A linear transformation $\mathbf{A}:\mathcal{V}_n\rightarrow\mathcal{W}_m$ has its $m\times n$ matrix representation $A_{ji}$ under basis sets $\left\{\mathbf{\hat{e}}_i\right\}\in\mathcal{V}_n$ and $\left\{\mathbf{\hat{f}}_i\right\}\in\mathcal{W}_m$ can be obtained by acting on the former by $\mathbf{A}$ to give $n$ vectors $\mathbf{u}_i=A_{ji}\mathbf{\hat{f}}_j $. When acting on a vector $\mathbf{c}\in\mathcal{V}_n$ we get $\mathbf{d}\in\mathcal{W}_m$, $\mathbf{d}=\mathbf{Ac}$. Under the basis set we have the matrix calculation of this transformation

$$\left(\begin{matrix}d_1\\\vdots\\d_m\end{matrix}\right)=\left(\begin{matrix}A_{11}&&\cdots&&A_{1n}\\\vdots&&&&\vdots\\A_{m1}&&\cdots&&A_{mn}\end{matrix}\right)\left(\begin{matrix}c_1\\\vdots\\c_n\end{matrix}\right)$$

or $d_j=A_{ji}c_i$.

On the other hand, if under a certain definition of tensor product of two vectors, the tensor product of $\mathbf{v}\in\mathcal{V}_n$ and $\mathbf{w}\in\mathcal{W}_m$ is expressed with respect to the same basis sets as $\mathbf{v}\otimes\mathbf{w}=v_i w_j \mathbf{\hat{e}}_i\otimes\mathbf{\hat{f}}_j$ the resulted tensor corresponds to the $n\times m$ matrix representation $v_iw_j$. To construct a tensor that can act on vectors of $\mathcal{V}_n$ by $\mathbf{v}$ and $\mathbf{w}$ we have to use $\mathbf{w}\otimes\mathbf{v}$.

Therefore, to express the linear transformation $\mathbf{A}$ by the two basis sets, it should be in the form $\mathbf{A}=A^\prime_{ij}\mathbf{\hat{f}}_i\otimes\mathbf{\hat{e}}_j$. To see the relation between the two $m\times n$ matrices $A_{ji}$ and $A^\prime_{ij}$ we applied again on vector $\mathbf{c}$, this time using the expression with $A^\prime_{ij}$, and requiring that the results to be $\mathbf{d}$. We get in this case $\mathbf{d}=A^\prime_{ij}c_k\left(\mathbf{\hat{f}}_i\otimes\mathbf{\hat{e}}_j\right)\mathbf{\hat{e}}_k$.

To proceed we need an addition postulation in the present discussion, that it is a rule that

$$\left(\mathbf{w}\otimes\mathbf{v}\right)\mathbf{c}=\mathbf{w}\left(\mathbf{v}\cdot\mathbf{c}\right)$$

Then, $\mathbf{d}=A^\prime_{ij}c_k\left(\mathbf{\hat{f}}_i\otimes\mathbf{\hat{e}}_j\right)\mathbf{\hat{e}}_k=A^\prime_{ij}c_k\mathbf{\hat{f}}_i\left(\mathbf{\hat{e}}_j\cdot\mathbf{\hat{e}}_k\right)$. We again have to end here unless in the special case that $\left\{\mathbf{\hat{e}}_i\right\}$ is orthonormal basis set. In this case $\mathbf{d}=A^\prime_{ij}c_j\mathbf{\hat{f}}_i$ or $d_i=A^\prime_{ij}c_j$. By a comparison and care on the subscripts we know that $A^\prime_{ij}=A_{ij},i=1,\cdots,n,j=1,\cdots,m$.

We can now conclude that the linear transformation $\mathbf{A}:\mathcal{V}_n\rightarrow\mathcal{W}_m$ can be expressed with respect of orthonormal basis sets $\left\{\mathbf{\hat{e}}_i\right\}\in\mathcal{V}_n$ and $\left\{\mathbf{\hat{f}}_i\right\}\in\mathcal{W}_m$ as $\mathbf{A}=A_{ij}\mathbf{\hat{f}}_i\otimes\mathbf{\hat{e}}_j$ under the rule $\left(\mathbf{w}\otimes\mathbf{v}\right)\mathbf{c}=\mathbf{w}\left(\mathbf{v}\cdot\mathbf{c}\right)$.

So the (total) derivative of function $\mathbf{f}$, $\mathrm{d}_\mathbf{x_0}\mathbf{f}$, i.e. the "correct" linear transformation we used to act on a unit vector to get a directional derivative, should be expressed as $\mathrm{d}_\mathbf{x_0}\mathbf{f}=\frac{\partial f_i}{\partial x_j}\mathbf{\hat{b}}_i\otimes\mathbf{\hat{a}}_j$ (since standard basis are orthonormal).

Related Question