Single constraint quadratic optimization dual form expression using the Schur complement

convex optimizationnon-convex-optimizationquadratic-formsschur-complement

Strong duality result for non-convex problem with two quadratic functions is a related question. However, I am trying to understand how the dual form problem comes about.


This dual form representation is presented in Appending B, Sec B.1. of Boyd & Vandenberghe's Convex Optimization, but the authors don't explain how it comes about. This appears to be an important result in optimization, so I would really like to understand it.

The primal problem is:
\begin{equation}\label{primal}
\begin{aligned}
& \underset{x}{\text{minimize}} & & x^TA_0x+2b_0^Tx+c_0 \\
& \text{subject to} & & x^TA_1x+2b_1^Tx+c_1 \leq 0 \\
\end{aligned}
\end{equation}

where $A_i \in S^n$, the set of symmetric $n \times n$ matrices, $b_i \in \mathbb{R}^n$, $c_i \in \mathbb{R}^n$. No positive semidefinite assumptions on $A_i$ exist. (Note that this is not a QCQP, because $A_i$ is not assumed to be positive semi-definite).

The dual function is clearly

\begin{equation}
g(\lambda)=c_0 + \lambda c_1-(b_0 + \lambda b_1)^T(A_0+\lambda A_1)^{\dagger}(b_0 + \lambda b_1) – (1)
\end{equation}

where $A_0+\lambda A_1\succeq 0$, $b_0 + \lambda b_1 \in C(A_0+\lambda A_1)$ ($C(.)$ denotes the column space), and $\dagger$ denotes the psuedo inverse. Else $g(\lambda)=-\infty$.

The text says that using a Schur complement, a dual problem can be expressed as:
\begin{equation}\label{dual}
\begin{aligned}
& \underset{\gamma, \lambda}{\text{maximize}} & & \gamma \\
& \text{subject to} & & \lambda \geq 0 \\
& & & \begin{bmatrix}A_0+\lambda A_1 & b_0 + \lambda b_1 \\ (b_0 + \lambda b_1)^T & c_0 + \lambda c_1 – \gamma\end{bmatrix} \succeq 0
\end{aligned}
\end{equation}

which is an SDP (semidefinite program) with variables $\gamma, \lambda \in \mathbb{R}$.

I understand that
\begin{equation}
\begin{aligned}
\\
& & & \begin{bmatrix}A_0+\lambda A_1 & b_0 + \lambda b_1 \\ (b_0 + \lambda b_1)^T & c_0 + \lambda c_1\end{bmatrix} \succeq 0
\end{aligned}
\end{equation}

is the matrix associated with the Schur complement of (1) above, but I don't see how to go any further.

How does this dual problem come about, and why does it make sense? Where did $\gamma$ come from, and why are we maximising simply one variable here? And then $\gamma$ appears in the matrix in the (2, 2) position?

Best Answer

After a considerable time to think about this, I think I might have arrived at how the dual form expression comes about. Posting this solution for future reference to anyone else that might get stuck on this problem.

First, the dual problem can be thought of as solving this problem:

\begin{equation} c_0 + \lambda c_1-(b_0 + \lambda b_1)^T(A_0+\lambda A_1)^{\dagger}(b_0 + \lambda b_1) \geq \gamma \end{equation} where we wish to maximise the value of $\gamma$, because we want to maximise the value of the dual function, which is eqn (1) above. But this must be subject to the contraints $A_0+\lambda A_1\succeq 0$, etc.

Obviously eqn (2) can trivially be re-expressed as:

\begin{equation} c_0 + \lambda c_1-\gamma - (b_0 + \lambda b_1)^T(A_0+\lambda A_1)^{\dagger}(b_0 + \lambda b_1) \geq 0 \ \ \ (2) \end{equation}

Equation (2) is obviously in the form of the schur complement matrix in the equation (where in this case the schur complement has dimension (1x1), i.e. it's a scalar). This can be expressed in the form of the Schur complement matrix above. Now eqn (2) being $\succeq 0$ (which is by definition equivalent to $\geq 0$ for positive-semifiniteness for a scalar) is equivalent to the the Schur complement matrix in the question above being $\succeq 0$, provided that also $A_0+\lambda A_1\succeq 0$, which we know is true. Understanding that the schur complement expression was actually a scalar was the "trick" to helping me understand why this dual form representation works.

Thus, maximising $\gamma$ in the dual problem maximises the dual function subject to its necessary constraints.

Related Question