[Math] Derive the dual of the semidefinite program $\max_Y{\rm Tr}(X^T Y)$ subject to $\begin{pmatrix}I&Y\\ Y^T&I\end{pmatrix}\succeq0$

convex optimizationconvex-analysisduality-theoremsreal-analysissemidefinite-programming

I have a semidefinite program (SDP) here, which is equivalent to the trace norm,

\begin{equation}
\begin{array}{ll}
\max_{Y \in \mathbb{R}^{m \times n}} & \text{trace}(X^T Y) \\
\text{subject to} &
\left[
\begin{array}{cc}
I_m & Y \\
Y^T & I_n
\end{array}
\right]
\succeq 0,
\end{array}
\label{eq:aa:primal}
\end{equation}

Now I want to derive its dual in a form,

\begin{equation}
\begin{array}{ll}
\min_{\substack{W_{1} \in \mathbb{S}^{m}, \\ W_{2} \in \mathbb{S}^{n}}} & \text{trace}(W_{1}) + \text{trace}(W_{2}) \\
\text{subject to} &
\left[
\begin{array}{cc}
W_{1} & (1/2) X \\
(1/2) X^T & W_{2}
\end{array}
\right]
\succeq 0,
\end{array}
\label{eq:aa:dual}
\end{equation}

How am I supposed to introduce the dual variables here?

Best Answer

EDIT: I mistakenly misread the original problem as a minimization, not a maximization, so this is wrong. For one thing, the dual is indeed a minimization. But there will also be some sign differences as well. My apologies to the OP, and feel free to remove your upvotes and/or downvote.

Let me go ahead and write out my suggestion. As I mentioned, the Lagrange multiplier for any SDP constraint $Q \succeq 0$ is a PSD matrix of the same size. Thus the Lagrangian is $$L(Y,Z) = \langle X, Y \rangle - \left\langle Z,\begin{bmatrix} I_m & Y \\ Y^T & I_n \end{bmatrix} \right\rangle$$ where $Z\in\mathcal{S}^{m+n}_+$. (I'm using the definition $\langle A,B \rangle = \mathop{\textrm{Tr}}(A^TB)$ here.) To facilitate simplification, we express $Z$ in a block form: $$Z \triangleq \begin{bmatrix} Z_{11} & Z_{12} \\ Z_{12}^T & Z_{22} \end{bmatrix}$$ $$\begin{aligned} L(Y,Z) &= \langle X, Y \rangle - \left\langle \begin{bmatrix} Z_{11} & Z_{12} \\ Z_{12}^T & Z_{22} \end{bmatrix}, \begin{bmatrix} I_m & Y \\ Y^T & I_n \end{bmatrix} \right\rangle\\ &= \langle X, Y \rangle - \langle Z_{11}, I_m \rangle - \langle Z_{22}, I_n \rangle - 2 \langle Z_{12}, Y \rangle \\ &= \langle X - 2 Z_{12}, Y \rangle - \langle Z_{11}, I_m \rangle - \langle Z_{22}, I_n\rangle \end{aligned}$$ The dual function is therefore $$g(Z) = \inf_Y L(Y,Z) = \begin{cases} - \langle Z_{11}, I_m \rangle - \langle Z_{22}, I_n \rangle & X - 2 Z_{12} = 0 \\ -\infty & X - 2 Z_{12} \neq 0 \end{cases}$$ The $-\infty$ arises from the fact that if $X\neq 2Z_{12}$, that linear expression is unbounded below. So the only way you get a bounded value for $g(Z)$ is if that first term is identically zero. (For nonlinear dual functions, you would do a more traditional minimization; e.g., by differentiating with respect to $Y$.)

Given this dual function, the dual problem is \begin{array}{ll} \text{maximize} & g(Z) \\ \text{subject to} & Z \succeq 0 \end{array} Technically, this is the true Lagrange dual. But in practice we move the implicit domain constraints out of the dual function. \begin{array}{ll} \text{maximize} & - \langle Z_{11}, I_m \rangle - \langle Z_{22}, I_n \rangle \\ \text{subject to} & 2 Z_{12} = X \\ & Z \succeq 0 \end{array} Eliminating $Z_{12}$ we obtain \begin{array}{ll} \text{maximize} & - \langle Z_{11}, I_m \rangle - \langle Z_{22}, I_n \rangle = -\mathop{\textrm{Tr}}(Z_{11}) - \mathop{\textrm{Tr}}(Z_{22}) \\ \text{subject to} & \begin{bmatrix} Z_{11} & (1/2) X \\ (1/2) X^T & Z_{22} \end{bmatrix} \succeq 0 \end{array} Of course, we have $W_1 \rightarrow Z_{11}$ and $W_2 \rightarrow Z_{22}$. And the true dual is a maximization, not a minimization as you have written it, so flip yours to a maximization (negating the objective) and we're done.

Related Solutions

[Math] Dual of a semidefinite program

I'm sure the link @AC_MOSEK offers here will give you the instructions you need. I'm also fond of Optimization by Vector Space Methods by David Luenberger. It's a bit dated in some respects, but its treatment of Lagrange multipliers is particularly well suited for conic and semidefinite programming, in my view.

But what the heck, I'll do it for you right here.

But before I do, would you mind if I ask why you want the dual? If it is purely for the intellectual pursuit, or to help you understand your problem better, then that's great; those are very good reasons. But in my view---and I suspect I disagree with my friends at MOSEK here---it is not something your average optimizer should be doing to help them solve an SDP model. That's a task best left to the solver and/or modeling framework you're using. After all, the conversion process is tedious and error-prone; that makes it ideal for automation. If you make one little mistake, your "dual" will no longer have a useful relationship to the primal; and it won't necessarily be obvious why, until the results just look odd. And what about models that sit "in between" the primal and dual standard forms? It's not always clear which form is the best choice to convert to. So let the system take responsibility for that.

Unfortunately, most bare solvers for SDP out there today do not do the conversion for you. That's a shame, in my opinion. However, SDP-aware modeling frameworks like CVX (disclaimer: mine!), YALMIP, or MOSEK's own Fusion system, are capable of it. (Am I right about Fusion, @AC_MOSEK?) CVX doesn't even ask you permission, it just picks whatever is best for the solver!

Rant over. (Not really a rant, an appeal.) On to the problem.

The first thing to do is form the Lagrangian. Most students of constrained optimization will know what to do for linear equations and inequalities; the challenge is to get it right for the SDP constraint. A Lagrange multiplier for an SDP constraint is itself a positive semidefinite matrix $Z$. The corresponding term in the Lagrangian uses use the standard inner product for symmetric matrices $\langle X,Z \rangle\triangleq \sum_{ij} \bar{X}_{ij} Z_{ij}$ ($\bar{X}_{ij}$ refers to the convex conjugate, which you can ignore if you're dealing with a real problem, of course). When both $X$ and $Z$ are positive semidefinite, this inner product is nonnegative.

To prevent confusion, I always assume the primal is a minimization, so I have to negate your objective function to $-\lambda$ here. I hope you will agree this is not a big deal; and we'll reverse that at the end of the process, so we end up with a minimization, like we expect. The Lagrangian is

$$\textstyle L(\lambda,y,Z,v,w) = -\lambda - \langle Z, \sum_i y_i C_I -\lambda I \rangle - \langle v, 1 - \sum_i y_i\rangle - \langle w,y \rangle$$ In addition to our Lagrange multiplier $Z$ for the SDP constraint, we have a multiplier $v$ for the equation, and $w\in\mathbb{R}^L_+$ for the linear inequalities. (A rule of thumb I use to get the signs right for inequalities: make sure you're subtracting a nonnegative quantity.)

The dual function is defined as $g(Z,v,w)=\inf_{\lambda,y} L(\lambda,y,Z,v,w)$. If you take the derivative with respect to $\lambda$ and $y$, you get this: $$-1+\langle Z,I \rangle = 0 \quad\Longrightarrow\quad \mathop{\textrm{Tr}}(Z) = 1$$ $$-\langle Z,C_i\rangle + v - w_i = 0 \quad\Longrightarrow\quad v = \langle Z,C_i \rangle + w_i = 0, \quad i=1,2,\dots,L$$ Note that these equations don't involve $\lambda$ and $y$ at all. What this means is that $g(Z,v,w)$ is unbounded below unless these particular conditions are met. When they are met, the only term that does not cancel out is $-v$. So your dual function is $$g(Z,v,w) = \begin{cases} -\sum_i v_i & \text{conditions above are met} \\ -\infty & \text{otherwise} \end{cases}$$ The dual problem in its raw form is $$\begin{array}{ll} \text{maximize} & g(Z,v,w) \\ \text{subject to} & Z\succeq 0 \\ & w\geq 0 \end{array}$$ Since we negated our primal problem to begin with, let's correct that here and turn the dual into a minimization: $$\begin{array}{ll} \text{minimize} & -g(Z,v,w) \\ \text{subject to} & Z\succeq 0 \\ & w\geq 0 \end{array}$$ Technically, this is the dual, and we could stop here. In practice, we never stop here, because $g$ is this weird function with implicit domain constraints. It's hard to have an intuitive sense of the dual in this form. Let's make those constraints explicit, leaving us with a linear objective. $$\begin{array}{ll} \text{minimize} & \textstyle v \\ \text{subject to} & \mathop{\textrm{Tr}}(Z) = 1 \\ & v = \langle Z,C_i \rangle + w_i ~~ i=1,2,\dots,L \\ & Z \succeq 0 \\ & w\geq 0 \end{array}$$ Now, I can eliminate $w$ rather simply by converting the $L$ equations to inequalities: $$\begin{array}{ll} \text{minimize} & v \\ \text{subject to} & \mathop{\textrm{Tr}}(Z) = 1 \\ & v \geq \langle Z,C_i \rangle ~~ i=1,2,\dots,L \\ & Z \succeq 0\end{array}$$ If you really want to get ambitious, and you're willing to accept a nonlinear objective, you can eliminate $v$ as well. Convince yourself that this is it: $$\begin{array}{ll} \text{minimize} & \textstyle \max_{i=1,2,\dots,L} \langle Z, C_i \rangle \\ \text{subject to} & \mathop{\textrm{Tr}}(Z) = 1 \\ & Z \succeq 0\end{array}$$ Of course, most solvers require linear objectives and constraints, but this one is in my view more informative. And it's the one I'd enter into CVX, if I were doing it... EDIT: in fact, your primal problem has a similar nonlinear equivalent: $$\begin{array}{ll} \text{maximize} & \lambda_{\min}(\textstyle \sum_i y_i C_i) \\ \text{subject to} & \textstyle \sum_i y_i= 1 \\ & y\geq 0 \end{array}$$And of course, $\mathop{\textrm{Tr}}(Z)=\sum_i\lambda_i(Z)$, so both problems have an eigenvalue interpretation.

How to Convert Matrix Completion Problem to Standard SDP Form

As indicated in the comments, this is the standard nuclear-norm optimization problem.

With a modelling layer such as YALMIP or CVX, you would simply do something along the lines of (here YALMIP in MATLAB)

X = sdpvar(m,n,'full');
E = Z - X;
optimize([],E(:)'*E(:) + lambda*norm(X,'nuclear'));

If you manually want to get the quadratic expression into a form supported by standard SDP solvers, you would, e.g., write it using a second-order cone constraint by replacing the quadratic term with a new variable $t$ and add an SOCP model of $||vec(E)||^2\leq t$. With $e = vec(E)$, this is equivalent to $\left|\left|\begin{bmatrix}2e\\1-t\end{bmatrix} \right|\right|\leq 1+t$

A horrible way to model it (from a performance view) would be as the SDP constraint $\begin{bmatrix}t& e^T \\e &I \end{bmatrix} \succeq 0.$

Your second expression is the nuclear norm, which can be written as the minimal value of $\text{tr}(A) + \text{trace}(B)$ where $\begin{bmatrix}A& X\\X^T &B\end{bmatrix} \succeq 0.$. Hence, simply introduce the two symmetric matrices $A$ and $B$ with associated constraint, and use the sum of traces instead of your square-root expression in the objective.

Best Answer

Related Solutions

[Math] Dual of a semidefinite program

How to Convert Matrix Completion Problem to Standard SDP Form

Related Question