[Math] Recovering the solution of optimization problem from the dual problem

convex optimizationduality-theorems

In the context of (most of the times convex) optimization problems –

I understand that I can build a Lagrange dual problem and assuming I know there is strong duality (no gap) I can find the optimum of the primal problem from the one of the dual. Now I want to find the primal optimum point (i.e. the point in which the optimum is attained). Somehow, it is assumed that both the optimum of the primal and dual point are achieved at the same point (which is then a saddle point). Why/when is that true? Does strong duality can happen only at a saddle point? Does that require the primal problem to be convex?

Now, from this, it is assumed it's enough for to find the minimizer of the Lagrangian at the value of dual parameters (point of the solution of the dual problem) in order to get the primal optimal point. Why is that true?

Thank you,
Dany

Best Answer

Ok, so I'm not totally sure whether this addresses all (some!) of your doubts, so let me know if it does not. As a disclaimer, what follows is just one out of several ways to view duality in optimisation (you can find others in the answers to this post).

To make things a bit concrete I'm going to assume that your primal problem is a minimisation problem with objective $f:\mathbb{R}^n\to\mathbb{R}$ and set of feasible points $X\subseteq\mathbb{R}^n$. Suppose we can find some function $\phi:\mathbb{R}^m\to\mathbb{R}$ and set $Y\subseteq\mathbb{R}^m$ that has the following property:

$\phi$ evaluated at any point $y\in Y$ gives a lower bound for $f$ over all of $X$. That is, for any $y\in Y$

$$f(x)\geq \phi(y)\quad\quad\forall x\in X.$$

Note that the above implies that for any $y\in Y$, $\phi(y)$ is also a lower bound for the optimum value of the primal (that is, $p^*:=\inf_{x\in X}f(x)$). Then an interesting question to ask is what is the best lower bound we can extract from $\phi$ and $Y$? What is $d^*:=\sup_{y\in Y}\phi(y)$?

We call answering this question "a dual problem to the original primal problem" and we say that "strong duality" holds for this primal-dual pair if $d^*=p^*$. Strong duality is handy because (assuming we can solve the dual problem) it gives us a way to check how suboptimal a candidate optimal point $\tilde{x}$ (which we have computed in some way) of the primal is by simply computing the difference $f(\tilde{x})-d^*$ (otherwise, since we don't know $p^*$, it is hard to judge how good is our candidate optimal point $\tilde{x}$). Indeed, if $f(\tilde{x})=d^*$ we know that $\tilde{x}$ is optimal and we can stop looking for a better $x$.

In order for the above to be of use, we should aim to construct a dual problem that is simple to solve. This is where the Lagrangian dual problem comes in; it is a dual problem which is always convex irrespective of whether the primal is. The objective function of the Lagrangian dual problem is

$$\phi(y):=\inf_{x\in X}L(x,y)$$

where $L$ is the so-called "Lagrangian function" and $y$ are the "Lagrangian multipliers".

To check whether strong duality holds for a given primal-dual pair there is a whole battery of sufficient conditions collectively referred to as constraint qualifications. The most basic of which applies to the case where the primal is convex and the dual is the Lagrangian dual is called Slater's condition. It asks that there exists at least one "strictly feasible" point for the primal (that is, that $X$ has non-empty relative interior). If I remember correctly, in this case strong duality holds if and only if $L(x,y)$ has a saddle point (which turns out to be a pair of optimal point $x$ and multipliers $y$), however I don't have an intuitive explanation why this is so.

With regards to how to actually find an optimal point $x^*$ what typically happens in practice is that people run an optimisation algorithm (often gradient decent or Newton's) on both (and separately) the primal and dual problems which hopefully generate some sequences $(x_n)$ and $(y_n)$ such that

$$f(x_1)\geq f(x_2)\geq f(x_3)\geq\dots,\quad\quad L(y_1)\leq L(y_2)\leq L(y_3)\leq\dots.$$

Every few iterations they stop and check whether the difference $f(x_n)-L(y_n)$ is smaller than some tolerance level and if so stop completely and say that $x_n$ optimal. They call the corresponding $y_n$ a certificate of optimality, because anyone can then verify that $x_n$ is (nearly) optimal by checking that the mentioned difference is small. This is the essence of the so called primal-dual solvers. Note that if the primal, said algorithms is guaranteed to generate $x_n$ such that $f(x_n)\to p^*$ (ditto with respect to the dual problem).