Nonlinear Optimization – Duality Theory Explained

duality-theoremsnonlinear optimization

I have been studying nonlinear optimization recently and have come across some results that I need clarification for. I will do my best to explain them in detail below, providing citations where necessary. I will number the issues/questions so that they can be addressed independently if needed. Apologies in advance for the length. It's my hope that this question will also help clarify things for others too.

Consider an equality-constrained nonlinear optimization problem
\begin{align}
\min_{x}&\quad f(x)\\
\nonumber \text{subject to } \quad&h_i(x) = 0,\,i=1,\ldots,I\\
\nonumber \quad&x\in X\subseteq\mathbb{R}^n
\end{align}
where all functions are assumed to be twice continuously differentiable. No further convexity/linearity assumptions regarding the functions are made.

I understand that a given optimization problem can have many different Lagrangian functions, depending on what constraints are dualized. One Lagrangian function associated with the nonlinear program is
$$\mathcal{L}(x,\lambda) = f(x) + \lambda^Th(x)$$
where $h(x) = (h_1(x),\ldots,h_I(x))$ and $\lambda$ are (unconstrained) dual variables.
Let us construct a dual function as
\begin{align}
\phi(\lambda) = \min_{x\in X}\mathcal{L}(x,\lambda)
\end{align}
(side note: It seems that the construction of this dual function could be just as difficult as solving the original problem — if the set $X$ is complicated)

The dual problem is then simply
\begin{align}
\max_\lambda \phi(\lambda)
\end{align}

Q1: For a given Lagrangian function, is the dual function unique?

Local duality theory

The reason I ask Q1 is because of a result termed the local duality theorem (Luenberger, "Linear and Nonlinear Programming") stated as follows

Suppose that the problem
\begin{align}
\min_{x}&\quad f(x)\\
\nonumber \text{subject to } \quad&h_i(x) = 0,\,i=1,\ldots,I
\end{align}
has a local solution at $x^*$ with corresponding value $r^*$ and Lagrange multiplier $\lambda^*$. Suppose also that $x^*$ is a regular point of the constraints and that the corresponding Hessian of the Lagrangian $L^*=L(x^*)$ is positive definite. Then the dual problem
\begin{align}
max\quad \phi(\lambda)
\end{align}
has a local solution at $\lambda^*$ with corresponding value $r^*$ and $x^*$ as the point corresponding to $\lambda^*$ in the definition of $\phi$.

To help understand what the theorem is saying (as far as I understand it — perhaps this is the incorrect interpretation), consider the following scenario. Imagine that the Lagrangian function is a function of two primal variables $x_1,x_2$ and has a contour plot that looks like the following figure.

enter image description here

The Lagrangian has five stationary points, three of which are local maxima, two of which are local minima.

Q2: Is it correct to think of there being two different dual functions, $\phi^1$ and $\phi^2$, each associated with their respective local minima? A sort-of local dual function?

Q3: If the interpretation of Q2 is correct then it seems that all the local duality theory is saying is that if we maximize a dual function, say $\phi^1(\lambda)$, over $\lambda\in\Lambda^1$ (where $\Lambda^1$ is some local space corresponding to the subset of $\mathbb{R}^2$ where the dual function was constructed), then we will reach the same objective value as the solution associated with the region in which we formed the corresponding dual function ($\phi^1$). Is this the correct line of thought?

This is where I start to get a bit confused. I always was under the impression that a zero duality gap implied that we have found the global solution to the primal problem. However, it seems that the local duality theorem states that we have reached a zero duality gap with a local solution of the primal problem. This leads to my next question:

Q4: Does a zero duality gap always imply that we have found a globally optimal solution? If so, why does this not contradict the local duality theorem?

Conditions for global optimality

The final group of questions I have is with respect to finding a global solution to the primal problem. It seems throughout this discussion that we are tantalizingly close to having a result that ensures that solving the dual problem will result in a zero duality gap with the global solution of the primal problem. The next question outlines what I think may be a sufficient condition for this:

Q5: It seems logical to think that if $\mathcal{L}(x,\lambda)$ is strictly convex in $x$ (in the whole space) then we will have a unique dual function and solving the dual problem will lead to a zero duality gap with the global solution of the primal problem. Is this reasoning sound? If not, does anyone know of a counterexample?

An alternative statement worth mentioning related to the concept of saddle points and global optimality is as follows (from Peter Morgan, “An explanation of constrained optimization for economists”)

If the saddle-point is a global $x$-min, $\lambda$-max saddle-point then the ‘$x$-part’ of the saddle-point is a global optimum for the constrained optimization problem.

This is summarized by a statement from the same source

Theorem 9.3 (Kuhn-Tucker Saddle-Point Theorem)
If the Lagrange function for the constrained problem possesses a global $x$-min, $\lambda$-max saddle-point $(x^*,\lambda^*)$ then $x^*$ is a globally optimal solution to the constrained problem.

Unfortunately due to the fact that I do not have access to the book by Morgan I do not know the precise definition for what he means by “global $x$-min, $\lambda$-max saddle-point”, nor do I have the proof.

Q6: Can anyone offer some explanation or insight into the statement by Morgan?

Best Answer

A1: Yes, by definition your dual function for the original constrained problem is unique. Moreover, it is by definition concave and its supremum (or maximum, when it's reachable) gives you a lower bound on the optimal value of the original problem.

A2: Yes, you are right, but you have to be careful here. I'm referring to Luenberger, Ch.14, p.442, eq.13. The authors clearly state that they define their dual function in the vicinity of $(x^*,\lambda^*)$. Take a subset of $X$, let's call it $S=X\cap B_\epsilon(x^*)$, where $B_\epsilon(x^*)$ is a ball of radius $\epsilon$ around $x^*$ and then you can define $\phi_S(\lambda)=\min\limits_{x\in S}L(x,\lambda)$ which is sort of a local dual function and it is different from the original one.

A3: Authors are a bit sloppy in their definitions of minima and local things, but as I understand, local duality says: suppose your original problem has a local minimum $r^*$ at $x^*$ and 1st order optimality condition (which is local) holds with a corresponding $\lambda^*$. If you consider $x\in S$ (as I do in A2) and construct $\phi_S(\lambda)$, then under the assumptions of the theorem, solving a dual problem for $\phi_S(\lambda)$ gives you the same $(x^*, \lambda^*)$ and $r^*$.

A4: Zero gap is not violated in this case because in fact you consider only $x\in S$, and not $x\in X$, so because in $S$ you have a locally convex problem and everything is nice, you get a zero gap locally, but not for the original problem in $X$.

Related Question