[Math] How to interpret Lagrangian function (specifically not Lagrangian multiplier)

lagrange multiplieroptimization

I am reading the following tutorial on Lagrangian multipliers (http://www.cs.berkeley.edu/~klein/papers/lagrange-multipliers.pdf). My goal is to gain an intuitive understanding of why the Lagrangian method works. In summary, the tutorial link above and other tutorials have helped me understand that constraint optimization boils down to having parallel normal vectors for the function to be maximized and the constraint function. This leads us to formulate the Lagrangian function as follows:

$$
\Lambda (x,\lambda ) =f(x)-\lambda g(x) .\tag 1
$$

where we are looking for points that satisfy

$$
\triangledown \Lambda (x,\lambda ) = 0 .\tag 2
$$

I understand we can extend this to multiple dimensions, and eventually also using KKT conditions inequality constraints … but my question is related to how to interpret the Lagrangian function $\Lambda(x,\lambda)$ itself.

The reason this question arises for me is, the tutorial goes on to explain that we can look at the Lagrangian as an "encoding of the problem" (Section 3.1 in the document). Quoting from the document: "You could imagine using the Lagrangian to do constrained maximization in the following way. You move $x$ around ${ R }^{ n }$ looking for a maximum value of $f$. However, you have no control over $\lambda$, which gets set in the worst possible way for you. Therefore, when you choose $x$, $\lambda$ is chosen to minimize $\Lambda$. Formally, the problem is to find the $x$ which gives

$$
{f}^{*} = \max _{ x }{ (\min _{ \lambda }{ \Lambda (x,\lambda )) } } .\tag 3
$$

I understand that we conceptually want to find the maximum value of $f(x)$ subject to $g(x)$ and thus, moving $x$ around in the ${ R }^{ n }$ to find the maximum value makes sense. It is clear to me that when the constraint is met, $\Lambda(x,\lambda)=f(x)$. This means that if the constraints are met, then ${ f }^{ * }=\max _{ x }{ f(x) }$ so indeed we are finding the maximum of the original cost function when the constraints are met. When the constraints are not met, if we minimize $\Lambda(x,\lambda)$ we will get $-\infty$ so maximizing that doesn't mean much. I guess in my head, what is preventing us from maximizing $\Lambda(x,\lambda)$? If the conditions are met, we still find the maximum value of $f(x)$ and if the conditions are not met, the max will be infinity? Is it as simple as, we are trying to prevent the maximum from being infinity? Another way to state my question is, why is Equation (3) above a valid formulation of the problem, and NOT Equation (4) given below:

$$
{f}^{*} = \max _{ x }{ (\max _{ \lambda }{ \Lambda (x,\lambda )) } } .\tag 4 <– NOT CORRECT
$$

I realize that if I just wanted to crunch out Lagrangians I probably don't really need to understand this, but for my application the "dual" problem is important, so a good understanding of this is important to me.

Thanks in advance!

Best Answer

So suppose you want to maximize $f(x)$ subject to the constraint $g(x) = 0$. One way to think about this is to maximize $f(x) - K (g(x))^2$, where $K$ is very large, and then see what happens as $K \to \infty$.

So to optimize $f(x) - K (g(x))^2$, we obtain $$ \nabla f(x) - 2K g(x) \cdot \nabla g(x) = 0 .\tag 1$$ If $K$ is very large, then any reasonable solution will satisfy that $2K g(x)$ is reasonably sized. Since $K$ is large, $g(x)$ will be very small, that is, effectively zero.

Then we see that the Lagrange multiplier $\lambda = \lim_{K\to\infty} 2K g(x)$ (remembering that $f$ and $g$ also depend upon $K$). Thus the Lagrange multiplier is the extent to which the system tries to change $g(x)$ from being zero.

So, for example, if you solve an Euler-Lagrange equation to find the equations of motion of a system of particles held together by rigid rods, then the Lagrange multiplier represents the internal tension in the rigid rods required to keep the particles in their fixed relative positions.

P.S. in equation (1), the dot product represents $$ \frac{\partial f}{\partial x_i} - \sum_j 2 K g_j \frac{\partial g_j}{\partial x_i} = 0 .$$ Thus $\lambda$ is a vector with the same dimension as the output of $g$.

Related Question