Optimization – How the Lagrangian Relates to the Perturbation Function

convex optimizationconvex-analysisnonlinear optimizationoptimization

Given a convex programming problem
$$\begin{align*}
\text{minimize} &\quad f(x) &\\
\text{such that} &\quad g_i(x) \leq 0 & i=1\dots k\\
& \quad h_j(x) = 0 & j= k+1\dots n
\end{align*}$$
the Lagrangian is
$$L(x, \lambda, \nu) = f(x) + \sum_i \lambda_i g_i(x) + \sum_j \nu_j h_j(x).$$
On the Wikipedia article on duality as well as in this math.stackchange thread, duality is explained in terms of a convex perturbation function $\phi :X \times Y \to \mathbb{R}$. In the example above

$$\begin{equation*}
\phi(x,y) =
\begin{cases}
f(x) \quad \text{if } g_i(x) +y_i \leq 0 \text{ and } y_j \leq h_j(x) \leq -y_j \\
\infty \quad \text{otherwise}.
\end{cases}
\end{equation*}$$

(This might already be incorrect).
In this presentation, the primal is

$$ \min_x \phi(x,0) $$

and the dual

$$ \max_{y^*} -\phi^*(0,y^*).$$

Similarly with the Lagrangian, the primal is

$$ \min_{x} \max_{\lambda \geq 0, \nu} L(x, \lambda, \nu) $$

with dual

$$ \max_{\lambda \geq 0, \nu} \min_{x} L(x, \lambda, \nu).$$

It seems that the Lagrangian and the perturbation function serve very similar purposes. What is the connection between $L$ and $\phi$?

Best Answer

I wrote the post you linked to so I'll take a stab at this.

Short answer: In terms of $\phi$, the dual problem is to maximize $-\phi^*(0,z)$ with respect to $z$. But \begin{align*} -\phi^*(0,z) &= - \sup_{x,y} \, \langle 0,x \rangle + \langle z, y \rangle - \phi(x,y) \\ &= \inf_x \, \inf_y \, \phi(x,y) - \langle z, y \rangle. \end{align*} The "Lagrangian" is the function \begin{equation*} L(x,z) = \inf_y \, \phi(x,y) - \langle z, y \rangle. \end{equation*} The "dual function" $G(z) = -\phi^*(0,z)$ is obtained by minimizing $L(x,z)$ with respect to $x$, as we are accustomed to. This expression for the Lagrangian appears on p. 54 of Ekeland and Temam.

That's what the Lagrangian looks like in the general setting. The significance of the Lagrangian is that for convex problems, under mild assumptions, $x$ and $z$ are primal and dual optimal if and only if $(x,z)$ is a saddle point of the Lagrangian. We sometimes think of convex optimization problems as coming in pairs (primal and dual problem), but actually they come in triples if we remember to include the saddle point problem. If we only knew about the primal problem and the dual problem, but not the saddle point problem, then we'd be missing $1/3$ of the picture.

Longer answer:

The function $\phi$ is the minimal notation we need to describe the perturbation viewpoint. The Lagrangian only appears later, when we write the dual problem more explicitly.

The perturbation viewpoint provides one possible explanation (my favorite explanation) of where the Lagrangian comes from, where the dual problem comes from, and why we expect strong duality to hold for convex problems. In the thread you linked to, note that the Lagrangian appears naturally at the very end. (The appearance of the Lagrangian should have been highlighted more explicitly.)

I'll repeat myself a bit here but I will add something new -- a derivation of the Lagrangian when we have equality constraints in addition to inequality constraints. I've been meaning to write this down anyway. I'll work in $\mathbb R^N$ rather than an abstract real vector space, and I'll use slightly different notation.

The primal problem is to minimize $\phi(x,0)$ with respect to $x$ (where $\phi:\mathbb R^n \times \mathbb R^m \to \mathbb R$). The perturbed problems are to minimize $\phi(x,y)$ with respect to $x$. We should also introduce the "value function" $v(y) = \inf_x \, \phi(x,y)$. ($v$ is called $h$ in the other thread.) So the primal problem is to evaluate $v(0)$. Now, using some highly intuitive facts about the convex conjugate, we note that $v(0) \geq v^{**}(0)$, and that when $\phi$ is convex we typically have equality. Thus, the dual problem is to evaluate $v^{**}(0)$. This is a very clear way of understanding what the dual problem is and why we care about it. And we can express this dual problem in terms of $\phi$: the dual problem is to maximize $-\phi^*(0,z)$ with respect to $z$. (Here $\phi^*$ is the convex conjugate of $\phi$. See other thread for details.)

We haven't yet mentioned the Lagrangian. But the Lagrangian will appear when we work these ideas out in detail for the optimization problem \begin{align*} \text{minimize} & \quad f_0(x) \\ \text{subject to} & \quad f(x) \leq 0,\\ & \quad h(x) = 0, \end{align*} where $f:\mathbb R^n \to \mathbb R^m$ and $h:\mathbb R^n \to \mathbb R^p$. The inequality $f(x) \leq 0$ should be interpreted component-wise.

We can perturb this problem as follows: \begin{align*} \text{minimize} & \quad f_0(x) \\ \text{subject to} & \quad f(x) + y_1 \leq 0,\\ & \quad h(x) +y_2= 0. \end{align*} This perturbed problem can be expressed as minimizing $\phi(x,y_1,y_2)$ with respect to $x$, where \begin{align*} \phi(x,y_1,y_2) = \begin{cases} f_0(x) & \quad \text{if } f(x) + y_1 \leq 0 \text{ and } h(x) + y_2 = 0, \\ \infty & \quad \text{otherwise}. \end{cases} \end{align*}

To find the dual problem, we need to evaluate $-\phi^*(0,z_1,z_2)$, which is a relatively straightforward calculation. \begin{align*} -\phi^*(0,z_1,z_2) &= - \sup_{\substack{f(x) + y_1 \leq 0 \\ h(x) + y_2 = 0}} \langle 0,x\rangle + \langle z_1,y_1 \rangle + \langle z_2,y_2 \rangle - f_0(x) \\ &= -\sup_{q \geq 0} \, \langle z_1,-f(x) - q \rangle + \langle z_2,-h(x)\rangle - f_0(x) \\ &= \begin{cases} \inf_x \, f_0(x) + \langle z_1,f(x) \rangle + \langle z_2, h(x) \rangle & \quad \text{if } z_1 \geq 0 \\ -\infty & \quad \text{otherwise}. \end{cases} \end{align*} In the final step, the Lagrangian \begin{equation} L(x,z_1,z_2) = f_0(x) + \langle z_1,f(x) \rangle + \langle z_2,h(x) \rangle \end{equation} has appeared, as has the usual description of the dual function (where you minimize the Lagrangian with respect to $x$). Up until this point, we didn't know what the Lagrangian was or why it would be relevant.