Why is this expression the Lagrangian dual problem of this QCQP

$\newcommand{\bx}{\mathbf{x}}$
$\newcommand{\bC}{\mathbf{C}}$
$\newcommand{\bA}{\mathbf{A}}$
$\newcommand{\bl}{\boldsymbol{\lambda}}$
$\newcommand{\bb}{\mathbf{b}}$
$\newcommand{\bQ}{\mathbf{Q}}$
Reading this great paper, I am having difficulties understanding the following result:

Let us consider a QCQP in a general form as
$$
\begin{align}
\min_{\bx\in\mathbb{R}^n}&\quad
\bx^\top\bC_0\bx \tag{14}\label{14}\\
\mathrm{s.t.}& \quad
\bx^\top \bA_i\bx = b_i~,\quad i=1~,~\cdots~,m~.
\end{align}
$$
[…]
Its dual problem is
$$
\begin{align}
\max_{\bl}&\quad\bb^\top\bl\tag{19}\label{19}\\
\mathrm{s.t.}& \quad
\bQ(\bl) = \bC_0 – \sum_{i=1}^{m}\lambda_i\bA_i \succeq 0.
\end{align}
$$
where $\bb = [b_1,~\cdots~,~b_m],~\bl = [\lambda_1,~\cdots,~\lambda_m]\in\mathbb{R}^m$.
Problem \ref{19} is called the Lagrangian dual problem of problem \ref{14}, and $\bQ(\bl)$ is the Hessian of the Lagrangian.

I understand that the Hessian of the Lagrangian $L(\bx,\bl):=\bx^\top\bC_0\bx – \sum_{i=1}^m\lambda_i(\bx^\top\bA_i\bx-b_i)$ w.r.t. the unknowns $\bx$ is given by (ignoring a multiplication by $2$): $\bQ(\bl)=\bC_0 – \sum_{i=1}^{m}\lambda_i\bA_i$.

However, I don't understand why \ref{14} can be generally reformulated by maximizing (w.r.t. the Lagrange multipliers $\bl$) the weighted sum of the values corresponding to the equality constraints: $\bb^\top\bl$ subject to $\bQ(\bl)$ being PSD.

Any help is highly appreciated.

Comment

You are asking about the relation between primal and dual, if the objectives are equal, does $x$ have to be feasible etc. If you had a convex problem with strong duality then you would have $d^*=p^*$, but here you most likely only have $d^*\leq p^*$ and probably the inequality is strict in most cases because you are working with a nonconvex problem to start with. This procedure is only about deriving a lower bound.

Actually, it can also be explained in a different way because the paper you quote first relaxes (14) to a problem (18), which is a standard PSD rank relaxation, so the solution of (18) is obviously a lower bound on (14) (most likely not strict unless something extra can be proven). Then problems (18) and (19) are actually SDP duals (derivation of dual the same as here), so (18) and (19) have strong duality and the same objective value. That is summarized in the diagram you have on the same page of the paper.

Best Answer

Maybe I can write a step by step explanation to clarify what I think are your confusions. Let $p^*$ denote the optimal value of the original problem and $F$ its feasible set. We start with that problem and write the Lagrangian

$$L(x,\lambda)=x^TC_0x-\sum_i\left(\lambda_ix^TA_ix-\lambda_ib_i\right)=x^TQ(\lambda)x+\lambda^Tb.$$

As always, by construction, $L$ has the property that whenever $x\in F$ is a feasible point and $\lambda$ is arbitrary then $L(x,\lambda)\leq x^TC_0x.$ Next, we proceed to compute the dual function

$$g(\lambda)=\min_x L(x,\lambda).$$

Note that the minimum is over all $x$, whether $x\in F$ or not is irrelevant here. Every value of $g(\lambda)$ will be a lower bound on $p^*$:

$$g(\lambda)=\min_x L(x,\lambda)\leq \min_{x\in F} L(x,\lambda) \leq \min_{x\in F} x^TC_0x=p^*$$

by what we already established. Therefore, to get the best lower bound on $p^*$ we want to make $g(\lambda)$ as large as possible, ie. maximize it, and then still:

$$\max_\lambda g(\lambda)\leq p^*.$$

Now what is $g(\lambda)$?

If $Q(\lambda)$ is not PSD, ie. has negative eigenvalue, then the expression $x^TQ(\lambda)x+\lambda^Tb$ can go to $-\infty$ by choosing $x$ to move along the eigenvector of that negative eigenvalue. If $Q(\lambda)\succeq 0$ then the minimizer in definition of $g(\lambda)$ will be $x=0$ and $g(\lambda)=\lambda^Tb$. Explicitly

$$ g(\lambda) = \begin{cases}\lambda^Tb & \mathrm{if}\ Q(\lambda)\succeq 0\\-\infty & \mathrm{otherwise}\end{cases} $$ So $$\max_\lambda g(\lambda)=\max_{\{\lambda:Q(\lambda)\succeq 0\}} g(\lambda)=\max_{\{\lambda:Q(\lambda)\succeq 0\}} \lambda^Tb$$ or in other words maximizing $g(\lambda)$ is the same as solving the problem

$$\mathrm{maximize}\ \lambda^Tb$$ $$\mathrm{s.t.}\ Q(\lambda)\succeq 0$$

which we call the dual and the optimum $d^*$ of that problem satisfies $d^*\leq p^*$.

The procedure I summarized is the same as shown for example in https://docs.mosek.com/modeling-cookbook/duality.html#lagrangian-and-the-dual-problem

Best Answer

Comment

Related Solutions

Derivative of the Lagrangian dual: Can I treat the optimal parameter $x^*$ in the dual $g(\lambda) = L(x^*, \lambda)$ as a constant w.r.t $\lambda$

Related Question

Derivative of the Lagrangian dual: Can I treat the optimal parameter $x^$ in the dual $g(\lambda) = L(x^, \lambda)$ as a constant w.r.t $\lambda$