[Math] Intuition behind Lagrange multiplier

lagrange multipliermultivariable-calculusoptimization

I noticed that all attempts of showcasing the intuition behind Lagrange's multipliers basically resort to the following example (taken from Wikipedia):
enter image description here

The reason why such examples make sense is that the level curves of the f function are either only decreasing (d1 < d2 < d3) or only increasing (d1 > d2 > d3) concentrically, so it's obvious that the most centric level curve touching the constraint curve is the minimum/maximum that we are looking for.

But in real life examples, I imagine we can have a function f whose level curves might look like this (they don't decrease/increase in an orderly fashion):

enter image description here

In the above example that I thought of, the maximization of f (subject to the g constraint) would not be the blue point (where the constraint curve is tangent to a level curve of f), but the two green points.

I haven't seen this kind of level arrangement in any tutorial/lecture/course and it just seems to me that every demonstration conveniently picks the most favorable scenario for presenting the intuition. I must be wrong somewhere but I can't figure out where.

Best Answer

For my part, I don’t find that way of explaining the Lagrange multiplier method particularly enlightening. Instead, I like to think of it in terms of directional derivatives. Suppose we have $f:\mathbb R^2\to\mathbb R$ and wish to find the critical points of $f$ restricted to the curve $\gamma$. If we “straighten out” $\gamma$, this becomes the familiar problem of finding the critical points of a single-variable function, which occur where its derivative vanishes. The analog of this condition in the original, non-straightened setting is that the directional derivative of $f$ in a direction tangent to $\gamma$ vanishes: loosely, $\nabla f\cdot \gamma'=0$. Now, the gradient of a function is normal to its level curves, so if $\gamma$ is given as the level curve of a function $g:\mathbb R^2\to\mathbb R$, then this condition is equivalent to the two gradients being parallel, i.e., $\nabla f=\lambda\nabla g$. This idea generalizes to higher dimensions: we have instead a level (hyper)surface of $g$ and want the derivative of $f$ in every direction tangent to $g$ to vanish, which is equivalent to the gradients of the two functions being parallel. For suitably well-behaved functions $g$, the above conceptual “flattening out” of the hypersurface is justified in a neighborhood of a point by the Implicit Function theorem.

Related Question