Understanding unit vector arguement for proving gradient is direction steepest ascent

multivariable-calculus

Why is gradient the direction of steepest ascent?

In this question and it's many answer, people argue gradient is direction of steepest ascent by defining a random unit vector and then saying that dot product of this with gradient is maximum when both of them are in the same direction. But how does this prove that gradient is direction of steepest ascent

And, it's not just me, the comments also involve asking the person who answered how the heck his arguement proves it. So, why is it so?

Further in the book, Electricity and magnetism by purcell, in page 64 end to page 65 start, he talks of gradient of a function dependent only distance from origin i.e a radial function f(r) and argues the shortest step which we can make to change f(r) to f(r+dr) is move in the radial direction. Ok, this some what makes sense to me but how do I extend this intuition for regular derivatives and further use this to understand the answer I showed before?

Best Answer

We can first ask what is meant by the direction of "steepest ascent" for a function $f(\vec{x})$ around $\vec{x}_0$? What we usually mean by this is the direction given by a vector $\vec{\delta}$ so that a minute (infinitesimal step) in that direction gives the greatest increase of the function while keeping the length of the minute step fixed. You can think of the following process as one that converges to finding the direction of steepest ascent:

  1. First start with all unit vectors $\vec{\delta}$, and ask: of all the possible $\vec{\delta}$, which one maximizes $f(\vec{x}_0 + \vec{\delta})$? Let the answer be $\vec{\delta}_1$.

  2. Then reduce the length of $\vec{\delta}$ to say, $0.1$ and ask the same question: of all such possible $\vec{\delta}$, which one maximizes $f(\vec{x}_0 + \vec{\delta})$? Let the answer be $\vec{\delta}_1$.

  3. Continue to reduce the length of $\vec{\delta}$ to be fixed at some number arbitrarily close (but not equal) to zero, and ask the same question, thereby constructing an infinite sequence $$\vec{\delta}_1, \vec{\delta}_2, \vec{\delta}_3, \cdots$$ For differentiable functions, the direction each $\vec{\delta}_i$ as $i \to \infty$ points in will approach a well-defined limit!

The limit of this infinite process can then be thought of the direction of steepest ascent. It maximizes the increase of the function per unit distance traveled around the point $\vec{x}_0$, in the limit where the distance traveled from $\vec{x}_0$ is very small.

To find exactly what direction $\vec{\delta}$ approaches, we can use the idea behind defining the derivative, which is to provide a linear approximation of a function around a point. The idea is that for a multivariate function $f(\vec{x})$ that has a gradient $\nabla f$, moving an infinitesimal amount $\vec{\delta}$ from $\vec{x}_0$ to $\vec{x}_0 + \vec{\delta}$ changes the value of the function from $f(\vec{x}_0)$ to $f(\vec{x}_0) + \nabla f(\vec{x}_0) \cdot \vec{\delta}$. Even though there will always be corrections on the order of $\vec{\delta}^2$ and beyond, they are negligible in the limit as $\delta \to \vec{0}$. That's because every differentiable function can be approximated as linear in the neighborhood of any arbitrary point in its domain.

The problem then becomes to find $\vec{\delta}$ such that $\nabla f(\vec{x}_0) \cdot \vec{\delta}$ is maximized (since this is the increase $f(\vec{x}_0 + \vec{\delta}) - f(\vec{x}_0)$ that is recorded. But remember there's a caveat: the length of all $\vec{\delta}$ must be held fixed. Because of this, finding the vector $\vec{\delta}$ that maximizes $\nabla f(\vec{x}_0) \cdot \vec{\delta}$ is equivalent to finding the unit vector $\hat{\delta}$ that maximizes $\nabla f(\vec{x}_0) \cdot \hat{\delta}$. From there you can check that choosing $\hat{\delta}$ to be the unit vector in the direction of $\nabla f (\vec{x}_0)$ maximizes this quantity.