The geometric proof for the Cartesian formula of the gradient

geometryvector analysis

In general, I have a good understanding of what the gradient is. Also, given the cartesian formula of the gradient, I can transform it into different coordinate systems. I am also familiar with the general, tensor-based definition, as partial derivatives contracted with the contravariant basis.

However, every explanation I have seen starts by defining the gradient as the vector whose dot product with any other vector gives the directional derivative along this second vector. And then, we get the Cartesian formula for the gradient, $\nabla f = \frac{\partial f}{\partial x}\vec{i} + \frac{\partial f}{\partial y}\vec{j} +\frac{\partial f}{\partial z}\vec{k}$, from which we can do everything else.

What I still can't figure out, is how this Cartesian formula is derived. I assume that this is fundamentally a geometric problem. Is there a proof showing how to derive the Cartesian formula for the gradient? Or is it more like an educated guess that you can later prove that it has the desired properties (i.e it points in the direction of greatest increase, and it gives you the directional derivative when doted with a vector)?

Thanks in advance for any help.

Cheers
Michael

Best Answer

$\newcommand{\dd}{\partial}$Let's say for definiteness that $f$ is a real-valued function of three variables, defined in some neighborhood of a point $x_{0}$. (The story holds with obvious modifications for any number of variables.)

The "primitive" concept is "differentiability": We say $f$ is differentiable at $x_{0}$ if there exists a linear function $L$ of three variables satisfying $$ \lim_{h \to 0} \frac{|f(x_{0} + h) - f(x_{0}) - Lh|}{|h|} = 0. $$ One straightforwardly shows $L$ is unique, and introduces the notation $Df(x_{0}) := L$.

The matrix of $Df(x_{0})$ with respect to the standard basis has the partial derivatives of $f$ as components, more-or-less by definition of a partial derivative: $$ Df(x_{0}) = [\begin{array}{@{}ccc@{}} D_{1}f(x_{0}) & D_{2}f(x_{0}) & D_{3}f(x_{0}) \\ \end{array}] = \Bigl[\begin{array}{@{}ccc@{}} \frac{\dd f}{\dd x}(x_{0}) & \frac{\dd f}{\dd y}(x_{0}) & \frac{\dd f}{\dd z}(x_{0}) \\ \end{array}\Bigr]. $$

Because of how linear functions work (which amounts to the chain rule here), if $v = (v_{1}, v_{2}, v_{3})$ is a vector, then \begin{align*} \frac{d}{dt}\bigg|_{t=0} f(x_{0} + tv) = Df(x_{0})v &= v_{1} D_{1}f(x_{0}) + v_{2} D_{2}f(x_{0}) + v_{3} D_{3}f(x_{0}) \\ &= v_{1} \frac{\dd f}{\dd x}(x_{0}) + v_{2} \frac{\dd f}{\dd y}(x_{0}) + v_{3} \frac{\dd f}{\dd z}(x_{0}). \end{align*}

This expression may be interpreted as a Euclidean dot product. We're led to give a name to the resulting vector $\nabla f(x_{0})$, a.k.a. the transpose of $Df(x_{0})$. That's "where the gradient comes from".

As for properties of the gradient, they follow from the preceding equation and properties of the dot product. If $\theta$ is the angle between $\nabla f(x_{0})$ and $v$, the directional derivative of $f$ at $x_{0}$ along $v$ is $$ D_{v}f(x_{0}) = \frac{d}{dt}\bigg|_{t=0} f(x_{0} + tv) = \nabla f(x_{0}) \cdot v = |\nabla f(x_{0}|\, |v| \cos\theta. $$ For $|v|$ fixed, $D_{v}f(x_{0})$ is maximized when $\theta = 0$ ($v$ is positively proportional to the gradient), miminized when $\theta = \pi$ ($v$ is negatively proportional to the gradient), and $0$ when $\theta = \pi/2$ ($v$ is orthogonal to the gradient).