[Math] Steepest Descent in elliptical error surface

convex optimizationgradient descentneural networksoptimization

I am watching the Neural Network videos by Prof. Geoff Hinton. In there he talks about the problem with elliptical error surface.

In particular, he says, if the error surface is very elliptical, the direction of steepest descent is perpendicular to the direction towards the minimum.

My understanding of steepest descent was that it is perpendicular to the error surface and it points towards the direction of minimum.

Slides:
enter image description here
enter image description here

Link to timestamped videos:
https://youtu.be/tIovUOirJkE?list=PLoRl3Ht4JOcdU872GhiYWf6jwrk_SNhz9&t=224
https://youtu.be/4BZBog1Zx6c?list=PLoRl3Ht4JOcdU872GhiYWf6jwrk_SNhz9&t=66

Questions:
1) By definition, shouldn't steepest descent point in the direction of minimum?
2) Can you help me understand why is steepest descent perpendicular to the direction of minimum in case of elliptical error surfaces? But points towards minimum in case of circular error surface.

PS: responding to Amakelov's answer (modifying question, since I can't upload images in comment):

enter image description here

Let's assume this is my elongated elliptical error surface bowl. In this for steepest descent, I will have a large ycomponent, small x component and variable z component (depending how low in the error bowl I am).

So, if I look at the top down view, I will see a big movement in the y axis and a small movement in the x axis but the movement will still be in the direction of minima (and not perpendicular to it). What am I missing here?

Best Answer

Here is a figure that may help. I draw a long ellipse, which is intended to be one contour line, and a gradient vector that I drew by eye. The gradient vector is always perpendicular to the contour lines. It should be clear that the vector does not point toward the center of the ellipse. It is also not really close to perpendicular to the direction toward the center of the ellipse. I think the slides you cite exaggerate the perpendicularity. It does get worse if the ellipse is even longer than I have drawn. enter image description here Added: To my eye, the gradient is closer to $45^\circ$ from the direction to the minimum than perpendicular. I think this is correct. If you minimize along the direction of the gradient you will cross the axis of the ellipse and stop at a point where your direction of travel is along the contour line. You stop and turn a right angle as I pointed out in my answer to your other question. Now you are again following the gradient and again about $45\circ$ from the direction to the minimum. You will zigzag across the axis. The good news is your total travel is only $\sqrt 2$ times as far as the direct route. The bad news is that each turn requires evaluating the gradient, which can be expensive. The point I picked is about the worst case. If you start closer to the end of the ellipse the gradient will be closer to the direction to the center. If you start farther from the end, the first step will be farther from the direction to the center but the next will be better.

Related Question