[Math] difference between all of these derivatives

derivativesmultivariable-calculus

In calculus II we were introduced to a bunch of new derivatives: the gradient, the derivative $D=\begin{bmatrix} \partial_{x_1} \\ \partial_{x_2} \\ \vdots \\ \partial_{x_n}\end{bmatrix}$, the Jacobian, the Hessian, the total differential, the directional derivative, the partial derivative, and something called a Frechet derivative (that one was only mentioned in passing).

I can apply the formulas to calculate these things, but what exactly are they? And how do they relate to each other?

For instance, the derivative of a function $f: \Bbb R \to \Bbb R$ gives the slope of the line tangent to $f$. Which one of the above gives you, for instance, the "slope" (I don't even know what to call a $2$-D slope) of a function $g: \Bbb R^2 \to \Bbb R$? I know that the partial derivatives give you the slope in the $x$, $y$, etc directions, but then what do the others do?

And how do they relate to each other? For instance, how does $D$ relate to say the directional derivative $\partial_{\vec v}$?

Thanks.

Best Answer

Let $f:A\subseteq\Bbb R^n\to \Bbb R$ and $\mathbf g:B\subseteq\Bbb R^m\to \Bbb R^n$ where $A,B$ are open subsets.

Directional derivative: The directional derivative of the function $\mathbf g$ in the direction of the vector $\mathbf v$ evaluated at the point $\mathbf p\in B$ is $$D_{\mathbf v}\mathbf g(\mathbf p) := \lim_{h\to 0}\frac{\mathbf g(\mathbf p+h\mathbf v)-\mathbf g(\mathbf p)}{h}$$ if the limit exists. Notice the similarity between this and the scalar derivative. The function $\mathbf g$ is said to be directionally differentiable (in the sense of Gâteaux) if the limit exists for all $\mathbf v\in B_\epsilon(\mathbf 0)$, for some $\epsilon >0$. Intuitively, the directional derivative is the (vector) slope of the function of $\mathbf p$ in the direction $\mathbf v$. These derivatives are used to define tangent vectors.
The directional derivative of $f$ is defined exactly the same, but will be scalar-valued. Alternatively, the directional derivative of $f$ can be defined as follows. Let $\gamma:\Bbb R\to\Bbb R^n$ be given by $\gamma(t)=\mathbf p + t\mathbf v$. Then we can define the directional derivative by $$D_{\mathbf v} f(\mathbf p) := (f\circ \gamma)'(0)$$ This definition has the benefit of being defined in terms of a purely scalar derivative. Confirm for yourself that these are equivalent definitions.
Partial derivative: The partial derivative of $\mathbf g$ wrt its $i$th argument, denoted $\partial_i \mathbf g$ or $\frac{\partial \mathbf g}{\partial x^i}$, is defined as $$\partial_i \mathbf g(\mathbf p) := D_{\mathbf e_i} \mathbf g(\mathbf p)$$ where $\mathbf e_i=(0,\dots, 0,1,0,\dots,0)$ is the vector with zeros in each coordinate except for a $1$ in the $i$th coordinate. These are, in some ways, the most important directional derivatives of a function.
The partial derivative of $f$ is defined exactly the same, but will be scalar-valued.
Gradient of a scalar field: The gradient of $f$, denoted $\nabla f$, at the point $\mathbf p\in A$ is the column matrix defined implicitly by the equation $$[\nabla f(\mathbf p)] \cdot \mathbf v = D_{\mathbf v}f(\mathbf p), \quad \forall \mathbf v\in B_\epsilon(\mathbf 0)$$ This uniquely identifies the gradient independently of the coordinate system, but it'd be nice to have an explicit formula for the gradient. It is easily to show (just let $\mathbf v$ be the basis vectors) that the above definition implies that in Cartesian coordinates the gradient is given by $$\nabla f(\mathbf p) = \pmatrix{\partial_1f(\mathbf p) \\ \vdots \\ \partial_n f(\mathbf p)}$$ Intuitively, the gradient tells you the direction and magnitude of greatest slope on the the $n$-surface $z=f(x^1, \dots, x^n)$ at the point $\mathbf p$.
Gradient of a vector field: The gradient of the vector function $\mathbf g$ is less standardized -- ask different people and they'll define it differently or even tell you that it's undefined. This is the definition in geometric calculus: $$ \nabla \mathbf g(\mathbf p):= \lim_{V\to \{\mathbf p\}}\frac{I^\dagger}{|V|}\oint_{\partial V} d\sigma \mathbf g$$ where $I$ is the positively oriented unit pseudoscalar of $\Bbb R^n$ and $V\subset B$ is a volume containing $\mathbf p$. It'd take a little work to explain exactly what's behind that definition, so instead here's a reference. However, just as above, the gradient simplifies if we assume a Cartesian coordinate system. In that case the gradient is given by $$\nabla \mathbf g(\mathbf p) = \sum_{i=1}^m\mathbf e_i\partial_i \mathbf g(\mathbf p)$$ Note that while this looks very similar to the definition of the gradient of a scalar function, it entails a geometric product between each $\mathbf e_i$ and $\partial_i \mathbf g(\mathbf p)$. I really have no idea what this thing is intuitively (except as the sum of the divergence and curl). It seems like it should represent some type of rotor.
Divergence: The divergence of the vector field $\mathbf g:\Bbb R^n\to\Bbb R^n$ at $\mathbf p$, denoted $\nabla \cdot \mathbf g(\mathbf p)$ or $\operatorname{div} \mathbf g(\mathbf p)$, is defined as $$\nabla \cdot \mathbf g(\mathbf p) := \lim_{V\to \{\mathbf p\}} \frac{1}{|V|}\oint_{\partial V}\mathbf g\cdot \mathbf ndS$$ where $V\subset B$ is a volume containing $\mathbf p$. This definition itself is designed to be intuitive -- it tells us that the divergence of $\mathbf g$ at $\mathbf p$ is the net flux density (density because we're dividing out the volume of $V$) of $\mathbf g$ through an infinitesimal volume at $\mathbf p$. So calculating the divergence everywhere in your space will tell you where the sources and sinks are in your vector field.
It should be noted that, like all of these tougher-looking definitions, this one simplifies in the case of Cartesian coordinates. This time to $$\nabla \cdot \mathbf g(\mathbf p) = \sum_i^n\partial_i \mathbf g_i(\mathbf p)$$ where $\mathbf g_i$ is the $i$th component of the vector function $\mathbf g$.
Curl: The curl of the vector field $\mathbf g:\Bbb R^3\to\Bbb R^3$ (note that the curl (at least when defined as a vector field) is only defined for vector fields on $\Bbb R^3$) at $\mathbf p$, denoted $\nabla \times \mathbf g(\mathbf p)$ or $\operatorname{curl}\mathbf g(\mathbf p)$, is defined implicitly by $$\nabla \times \mathbf g(\mathbf p)\cdot \mathbf n := \lim_{A\to\{\mathbf p\}} \frac{1}{|A|}\oint_{\partial A} \mathbf g\cdot d\mathbf r$$ where $A$ is a plane segment containing $\mathbf p$ and $\mathbf n$ is the unit normal to $A$ where the orientation of $\partial A$ is chosen so as to follow the right-hand convention. Again, this definition is meant to be intuitive. It tells us that the curl of $\mathbf g$ at $\mathbf p$ is the amount of "rotation" of $\mathbf g$ at $\mathbf p$. Intuitively, we can see $\mathbf g$ as air flow and then the curl would tell us the tendency of small floating objects to spin in that air flow.
And again, this definition simplifies for Cartesian coordinates to $$\nabla \times \mathbf g(\mathbf p) = \big(\partial_2\mathbf g_3(\mathbf p) - \partial_3\mathbf g_2(\mathbf p), \partial_3\mathbf g_1(\mathbf p) - \partial_1\mathbf g_3(\mathbf p),\partial_1\mathbf g_2(\mathbf p) - \partial_2\mathbf g_1(\mathbf p)\big)$$
(Total) Derivative: The function $\mathbf g$ is said to be differentiable at $\mathbf p\in B$ if there exists a linear function $D\mathbf g(\mathbf p)\in\mathcal L(\Bbb R^m,\Bbb R^n)$ satisfying $$\lim_{\mathbf h\to \mathbf 0}\frac{\|\mathbf g(\mathbf p+\mathbf h)-\mathbf g(\mathbf p)-[D\mathbf g(\mathbf p)](\mathbf h)\|_{\Bbb R^n}}{\|\mathbf h\|_{\Bbb R^m}}=0$$ If $D\mathbf g(\mathbf p)$ exists then we call it the derivative (AKA differential) of $\mathbf g$ at $\mathbf p$. Note that if $D\mathbf g(\mathbf p)$ exists then its matrix representation is the Jacobian matrix. Intuitively $D\mathbf g(\mathbf p)$ describes how $\mathbf g$ responds to a little move away from $\mathbf p$.
The derivative of $f$ is obtained by setting $n=1$, where $\|\cdot\|_{\Bbb R^1}$ is simply the absolute value function. Note that if $Df(\mathbf p)$ exists then its matrix representation is $[\nabla f(\mathbf p)]^T$.
Hessian: The Hessian of $f$ at $\mathbf p\in A$, denoted $Hf(\mathbf p)$, is the matrix representation of $D^2f(\mathbf p)$. It is defined as $$Hf(\mathbf p) := \begin{bmatrix} {\partial_1}^2 f(\mathbf p) & \cdots & \partial_1\partial_n f(\mathbf p) \\ \vdots & & \vdots \\ \partial_n\partial_1 f(\mathbf p) & \cdots & {\partial_n}^2 f(\mathbf p)\end{bmatrix}$$ Note that $D^2f(\mathbf p)$ is a $2$-argument function so if $\mathbf u, \mathbf v \in \Bbb R^n$ and $D^2f(\mathbf p)$ exists, then $$\big[D^2f(\mathbf p)[\mathbf u,\mathbf v]\big] = [\mathbf u]^THf(\mathbf p)[\mathbf v]$$ Intuitively this is the analog of the second derivative in scalar calculus. It gives an idea of concavity.

As for the connection between these, beyond what I've already stated above, the total derivative really is the one that contains all of the information of the others. The total derivative of $f$ has matrix representation $[\nabla f]^T$, whose coordinates in the Cartesian coordinate system are exactly the partial derivatives, and it has the property that $$Df(\mathbf p)(\mathbf v) = D_{\mathbf v}f(\mathbf p)$$

Very similar statements hold for $D\mathbf g$. But also $D\mathbf g$ encodes all of the information contained in the divergence and curl. $\nabla\cdot \mathbf g(\mathbf p)$ is just the trace of $D\mathbf g(\mathbf p)$ and $\nabla \times \mathbf g(\mathbf p)$ has the exact same components as $J\mathbf g(\mathbf p)-(J\mathbf g(\mathbf p))^T$ where $J\mathbf g(\mathbf p)$ is the Jacobian of $\mathbf g$ at $\mathbf p$ -- which as stated above is just the matrix representation of $D\mathbf g(\mathbf p)$.

The Fréchet derivative is defined on Banach spaces, which as you may or may not be aware are generally infinite dimensional space and thus require a little bit more care. In fact Banach spaces don't even have a notion of an inner product to exploit. But they do have a norm. Thus maybe it won't surprise you that the definition is given as follows. Let $V,W$ be Banach spaces and $f:V\to W$. Then $f$ is Fréchet differentiable at the point $p$ if there exists a bounded linear function $L:V\to W$ such that $$\lim_{h\to 0}\frac{\|f(p+h)-f(p)-L(h)\|_W}{\|h\|_V}=0$$ If $L$ exists then it's called the Fréchet derivative and is denoted $Df(p)$.

Since you asked about the tangent plane to a surface, I'll share this as well:

Let $\mathbf x: A \subseteq \Bbb R^2 \to S \subseteq \Bbb R^n$ parametrize a surface. Let $\mathbf q\in A$ and set $\mathbf p=\mathbf x(\mathbf q)$. Then the vector $D_{\mathbf w}\mathbf x(\mathbf q)$ is a tangent vector to $S$ at $\mathbf p$. The set of all tangent vectors (in this case, the tangent plane) at $\mathbf p$ is the tangent space to $S$ at $\mathbf p$. It's denoted $T_{\mathbf p}$.

Best Answer

Related Solutions

Multivariable Calculus – Gradient and Jacobian Row and Column Conventions

[Math] Finding all directional derivatives of a function involving absolute value.

Related Question