Need help with a simple example where it’s not clear that the gradient is in direction of “steepest ascent”

gradient descentintuitionmultivariable-calculusvector analysis

Say I am on a point $(x^*,y^*)$ of a function $f(x,y)$ where the function value increases if I go a very small step in any positive direction (i.e. in the direction of a vector where the coordinates $x$ and $y$ are both positive), but the function increases MORE if I go in a very small step in another direction, say a vector where the $x$-coordinate is positive but the $y$-coordinate is negative. Doesn't that mean that the gradient does not point in the direction of steepest ascent?

There was a great answer in this thread about seeing the region around the point as "almost planar", but I still don't see why the function can't be differentiable in that point and still increase in both directions (even if its by a infinitesimal amount), and increase just a little bit more in one direction than another. Does it really HAVE to mean that there is a sharp turn just at that point? Why can't it be smooth but still not planar?

I have drawn two examples where I am imagining that the point I am evaluating the gradient at is $(0,0)$. From there, it is supposed to be steeper to go in the direction of $(-ax, -by)$ than $(ax, by)$:

Example 1

Example 2

I am fairly new to math and very technical explanations are still hard for me to understand. I know I am asking for much, but additional ways of looking at it which are not algebraic would help me the most.

Thanks.

Best Answer

If a function is increasing in one direction, but increasing faster in another direction, it does not mean the gradient is not the direction of steepest increase; it means the first direction is not the direction of the gradient.


If the function is differentiable, it can have "bumps" and/or concave/convex "bowls" like the ones you've drawn, but if you zoom in very close to the point $(x^*,y^*)$ those bumps or concavities will become less and less visible until you see something that looks more like a tilted plane.

But let's look more closely at a tilted plane. If you take a flat plane and tilt it, only one line in the plane stays at the original height of the plane. Every other point in the plane is either raised or lowered. The points that are raised are all on the same side of that line. So if the line has $x,y$ coordinates that pass through $(x^*,y^*),$ a movement in any direction on that plane that stays on the "raised" side of that line will produce an increase in height.

For a specific example, suppose the $x,y$ plot of the "original height" line makes an angle of $80$ degrees counterclockwise from the positive $x$ axis, and going in the direction of the positive $x$ axis the height of the plane is increasing. With that setup, you will have an increasing height of the plane in any direction you go as long as the direction is between $0$ and $80$ degrees counterclockwise from the positive $x$ axis, or between $0$ and $100$ degrees clockwise from the positive $x$ axis. That's a $180$-degree range of directions all with increasing heights. (The other $180$-degree range of directions has decreasing heights.)

The height of the plane in this example will increase the fastest if you go at an angle $10$ degrees clockwise from the positive $x$ axis--a direction vector with positive $x$ but negative $y$ coordinate. That's the direction of the gradient of that plane (and the direction of the gradient of any two-variable function whose derivative is identified by that plane). But the height of the plane will also increase if you go at an angle $10$ degrees counterclockwise from the positive $x$ axis, where the direction vector has both positive $x$ and $y$ coordinates. It just won't increase quite as fast.

If you're looking in two exactly opposite directions, with direction vectors such as $(a,b)$ and $(-a,-b)$, and you see increases in both directions, then you're in one of the following situations:

  • The steps you are making away from $(x^*,y^*)$ are too large. If you make the steps small enough, one of the directions will show a decrease instead of an increase.
  • You are looking at a point where the derivative is exactly zero, the tangent plane is exactly horizontal, and the gradient doesn't have a direction.
  • You are looking at a point where the function is not differentiable.