[Math] Augmented Lagrangian

convex optimizationlagrange multiplieroptimization

Consider the following equality constraint minimization problem:

minimize $\text{ }f(x)$

subject to $Ax=b$

Its Lagrangian is then:

$L(x,y) = f(x) + y^T(Ax-b)$

We can use then gradient ascent to solve this problem though it is said that it works with lots of strong assumptions.

Its Augmented Lagrangian is defined as:

$L(x,y)_{\rho} = f(x) + y^T(Ax-b) + \frac{\rho}{2} \|Ax-b \|^2_2$

Then we can use this in the method of multipliers (Hestenes, Powell 1969). It is said that this method is designed to robustify dual ascent.

Could anyone explain to me the intuition behind Augmented Lagrangian? As far as I can see we are combining penalty function method with Lagrange multiplier. But how does this bring robustness?

P.S. Boyd says that The benefit of including the penalty term is that $g_p$ (dual function associated to the Augmented Lagrangian) can be shown to be differentiable under rather mild conditions on the original problem.

How does adding extra term help to make the construction "more differentiable"? Or he refers to the fact that, say, when we deal with the L1 or nuclear norms, this extra term allows us to introduce proximal operator?

Best Answer

I'll assume $f$ is convex. (I should probably also assume $f$ is closed and proper.)

As you said, the Lagrangian is $L(x,y) = f(x) + \langle y, Ax - b \rangle$. The dual function is \begin{align*} g(y) &= \inf_x \, L(x,y) \\ &= \inf_x \, f(x) - \langle -A^Ty, x \rangle - \langle y, b \rangle \\ &= - \sup_x \, \langle -A^Ty, x \rangle - f(x) + \langle y, b \rangle \\ &= - f^*(-A^T y) - \langle y, b \rangle. \end{align*} Here $f^*$ is the convex conjugate of $f$.

The dual problem, expressed as a minimization problem, is \begin{equation*} \text{minimize} \quad f^*(-A^T y) + \langle y, b \rangle. \end{equation*}

Note that the dual function might not be differentiable! That is a key point. To guarantee that $f^*$ is differentiable, we need to assume that $f$ is strictly convex, which is often not the case. This prevents us (often) from simply solving the dual problem with gradient ascent.

So what can we do if the dual function is not differentiable? How can we minimize a nondifferentiable function? One option is to use the proximal point method. The proximal point method does not require the objective function to be differentiable.

If you work out the details, you will find that the augmented Lagrangian method is what you get when you solve the dual problem with the proximal point method.

Vandenberghe's 236c notes are a good source for this material.

Related Question