[Math] Geometric intuition for directional derivatives

derivativesmultivariable-calculuspartial derivative

What I'm trying to do in this post is to see that the intuition I've built is correct, and, if it's not, I would like someone to share its own intuition on why directional derivates are related with the gradient vector.

My intuition:

The formal definition of a directional derivative is:
$$
\frac{\partial f}{\partial \vec{v}} =\nabla f(a,b) \cdot \vec{v}
$$

where $\vec{v}$ is the vector that indicates the direction where we need to compute the rates of change.

By the definition of partial derivatives, when we compute $\frac{\partial f}{\partial x}$ , we're fixing a plane in $y$ direction, and just analysing what a tiny change in $x$ effects our output. Same happens in $\frac{\partial f}{\partial y}$, we fix a plane in $x$ direction, and analyse what a tiny change in $y$ effects our output.

Now, when we compute a directional derivate of $\vec{v}$, what we're doing (in my head) is fixing a plane, $\beta$, that has $\vec{v}$ as one of it's directional vectors and intersects the surface. Just like the picture below:

enter image description here

Because $\beta$ has $\vec{v}$ as one of it's directional vectors, what we're essentially doing is checking what a tiny change in direction of $\vec{v}$ causes to our output (surface). But we already know what a tiny change in $x$ causes and what a tiny change in $y$ causes, respectively, $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$

Assuming that everything I've said is correct, we can decompose $\vec{v}$ in some linear combination of the basis vector of our space, in this case, the standard basis:
$$
\vec{v} = \left[\begin{matrix}
a\\
b\\
c\\
\end{matrix}\right] = a \cdot \left[\begin{matrix}
1\\
0\\
0\\
\end{matrix}\right] + b \cdot \left[\begin{matrix}
0\\
1\\
0\\
\end{matrix}\right] + c \cdot \left[\begin{matrix}
0\\
0\\
1\\
\end{matrix}\right]
$$

Ignoring the third vector $\left[\begin{matrix}
0\\
0\\
1\\
\end{matrix}\right]$
because it deals with our output, we can see $\left[\begin{matrix}
1\\
0\\
0\\
\end{matrix}\right]$
and $\left[\begin{matrix}
0\\
1\\
0\\
\end{matrix}\right]$
as some change in $x$ and $y$ direction, that are computed by their partial derivatives, and we're looking on what a tiny change in $\vec{v}$ direction causes to our output, hence:

$$
\frac{\partial f}{\partial \vec{v}} = a \cdot \frac{\partial f}{\partial x} + b \cdot \frac{\partial f}{\partial y}\\
\frac{\partial f}{\partial \vec{v}} = \nabla f(a,b) \cdot \vec{v}
$$

Am I correct?

Best Answer

Let’s back up a bit. As Hans Ludmark points out in his comment above, the basic definition of the directional derivative in the direction specified by the unit vector $\mathbf u=(u_1,u_2)$ at a point $P=(a,b)$ is via a limit similar to the one from elementary calculus: $${\partial f\over\partial\mathbf u}(a,b)=\lim_{h\to0}{f(a+hu_1,b+hu_2)-f(a,b)\over h}.$$ As you’ve observed, this amounts to taking a vertical slice through the surface and then computing the ordinary derivative of that slice, as illustrated below.

directional derivative

This derivative is, of course, the slope of the tangent line (blue) to the slice at that point. Observe that this line is also the intersection of the tangent plane at that point (grayish blue) with the cutting plane (violet), so we can interpret the directional derivative as the steepness of the tangent plane in a given direction. As you rotate the cutting plane around $P$, the slope of this line changes, reaching a maximum when the two planes are perpendicular, as we’ll see below. (You can also see that this is the case by visualizing cutting a cylinder parallel to the $z$-axis by a plane and imagining what happens to the high point as you move that plane around.)

Let’s say that the tangent plane is given by the equation $\lambda x+\mu y-z=d$ with normal $\mathbf n_t=(\lambda,\mu,-1)$. A normal to the cutting plane is $\mathbf n_c=(-u_2,u_1,0)$, which is just $\mathbf u$ rotated ninety degrees. In $\mathbb R^3$ we can find the direction of the line of intersection via a cross product: $$\mathbf n_t\times\mathbf n_c=(u_1,u_2,\lambda u_1+\mu u_2)$$ and the slope of this line is thus $${\lambda u_1+\mu u_2\over\sqrt{u_1^2+u_2^2}}=\lambda u_1+\mu u_2=(\lambda,\mu)\cdot\mathbf u=\|(\lambda,\mu)\|\cos\phi,$$ where $\phi$ is the angle between the projection of $\mathbf n_t$ onto the $x$-$y$ plane and $\mathbf u$. The slope is therefore maximal when $\phi=0$, i.e., when $\mathbf u$ and the projection of $\mathbf n_t$ point in the same direction, but this happens when the two planes are perpendicular. The maximum value of this slope is $\|(\lambda,\mu)\|$.

This is where the gradient of $f$ comes in. If we write the equation of the surface as $F(x,y,z)=f(x,y)-z=0$, then $\nabla F=(f_x,f_y,-1)$ is normal to the surface, so an equation of the tangent plane at $(a,b,f(a,b))$ is $$xf_x(a,b)+yf_y(a,b)-z=af_x(a,b)+bf_y(a,b)-f(a,b).$$ This is exactly in the form analyzed above, with $\lambda=f_x(a,b)$ and $\mu=f_y(a,b)$, so $${\partial f\over\partial\mathbf u}(a,b)=\nabla f(a,b)\cdot\mathbf u$$ with the maximal rate of change given by $\|\nabla f(a,b)\|$.

This seems awfully coincidental, but it’s not. Going back to the plane equation $\lambda x+\mu y-z=d$ above, the coefficients $\lambda$ and $\mu$ are respectively the “$x$-slope” and “$y$-slope,” i.e., the slopes of the intersections with planes parallel to the $x$- and $y$-axes. These slopes are encoded in the normal $(\lambda,\mu,-1)$. For the tangent plane, these slopes are the directional derivatives in the directions of the coordinate axes, also known as the partial derivatives of $f$.