Derivation of Lagrange Multipliers

calculuslagrange multipliermultivariable-calculus

I'm reading Mathematical Methods for Physics and Engineering by Riley, Hobson, and Bence, and here is an excerpt from their derivation of Lagrange multipliers:

"To maximize $f$ we require
$$ \mathrm d f = \frac{\partial f}{\partial x} \mathrm dx + \frac{\partial f}{\partial y} \mathrm dy = 0.$$
If $\mathrm dx$ and $\mathrm dy$ were independent, we could conclude that $f_x = 0 = f_y$. However, they are not independent, but constrained because $g$ is constant:
$$ \mathrm d g = \frac{\partial g}{\partial x} \mathrm dx + \frac{\partial g}{\partial y} \mathrm dy = 0. $$
Multiplying $\mathrm dg$ by an as yet unknown number $\lambda$ and adding it to $\mathrm df$ we obtain
$$ \mathrm d(f + \lambda g) = \left( \frac{\partial f}{\partial x} + \lambda \frac{\partial g}{\partial x} \right)\mathrm dx + \left( \frac{\partial f}{\partial y} + \lambda \frac{\partial g}{\partial y} \right)\mathrm dy = 0,$$
where $\lambda$ is the Lagrange undetermined multiplier. In this equation $\mathrm dx$ and $\mathrm dy$ are to be independent and arbitrary; we must therefore choose $\lambda$ such that
$$ \frac{\partial f}{\partial x} + \lambda \frac{\partial g}{\partial x} = 0,$$
$$ \frac{\partial f}{\partial y} + \lambda \frac{\partial g}{\partial y} = 0.$$

…"

What I don't understand is how they can go from saying that $\mathrm dx$ and $\mathrm d y$ are not independent due to the path constraint given by $g(x,y)$ to stating that $\mathrm dx$ and $\mathrm d y$ must be independent and arbitrary in the $\mathrm d(f+\lambda g) = 0 $ equation, implying the Lagrange multiplier equations. To my understanding, $\mathrm d f = 0$ imposes a constraint on a stationary point, whereas $\mathrm d g = 0$ imposes a constraint on a path, thereby making $\mathrm dx$ and $\mathrm dy$ depend on each other. How can this dependence be lost just by adding the two together?

Best Answer

What is actually happening in terms of differentials is not even what they said. It is actually (assuming the constraint is $g=0$):

$$dL=d(f+\lambda g)=(f_x + \lambda g_x) dx + (f_y + \lambda g_y) dy + g d\lambda = 0.$$

The Lagrange condition says that a stationary point of $f$ subject to $g=0$ satisfies this relation without an explicit dependence between $x$ and $y$. The constraint is instead enforced indirectly by the fact that you are free to change $\lambda$, rather than by only searching on the surface $g=0$ in the first place.

Note that just now I did not show that the Lagrange condition actually works. I just explained how to show that a stationary point of the Lagrangian satisfies the constraint.

To me, assuming you don't want to get into the formalism of differential geometry, the Lagrange condition is most easily understood by talking about perpendicular vectors. To have stationarity, $\nabla f$ must be perpendicular to any direction that you're allowed to move while preserving the constraint, and you are allowed to move in directions for which the directional derivative of $g$ is zero.

These are precisely the directions perpendicular to $\nabla g$. So for stationarity, $\nabla f$ must be perpendicular to any vector that's perpendicular to $\nabla g$. In 2D one can see geometrically that this implies these vectors are multiples of each other (two rotations by $\pm \pi/2$ either does nothing or multiplies by $-1$). Using some linear algebra we see that this holds in any dimension (and also reveal how the generalization to multiple scalar equality constraints works).

Related Question