Minimize the sum of $r$-largest entries with constraints

convex optimizationlinear programmingoptimization

I was being stuck by this optimization problem for a long time. My optimization problem is about to minimize the sum of the largest $r$ entries of a vector.
$$
min_x \sum_{l=1}^r(Ax)_{[l]}~~s.t. Bx\le b
$$
where $x\in \mathbb{R}^n$ is the decision variable, and $(\cdot)_{[l]}$ is the $l$-th largest componet of $(\cdot)$.

The objective is convex and so do the constraints (see Example 3.14 on page 87 of "Convex Optimization" by Prof. Boyd). Furthermore, this problem can be transformed into the following linear programming
$$
\begin{aligned}
min_{x,z}~z\\
s.t.~ Bx\le b\\
QAx\le r1
\end{aligned}
$$
where $Q\in\mathbb{R}^{\left(\begin{array}{c}n\\r\end{array}\right)\times n}$ is the combination matrix which has $r$ entries with 1 and $(n-r)$ entries with 0. The matrix $Q$ actually indicates all combination of $r$ of $Ax$'s entries' sum.

This transformation seems to be great, and the transformed problem can be solved using the interior point method. However, there are more than $\left(\begin{array}{c}n\\r\end{array}\right)$ inequality constraints, which makes it hard to solve.

My question is how can we solve this problem more efficiently. I searched online and find that the "sub-gradient method" seems to work. However, this problem may converge rather slow.
Any suggestions and clues are welcomed. Thanks ahead.

Best Answer

You appear to talk about some naive enumerative representation of sum-of-k-largest. You can model that much more efficiently.

The sum of the $k$ largest values of $Ax\in \mathbb{R}^n$ can be represented, when used in a convexity preserving setting, using a scalar $t$ by introducing a scalar $s$ and vector $z\in \mathbb{R}^n$ with

$$ t\geq ks + \sum_{i=1}^n z_i,~z\geq 0,~ z +s \geq Ax $$

I do not have a reference in my head at the moment, although I suspect you can find it in the book you are referring to.

Related Solutions

Least Squares Problem with Non-Negativity Constraints – Optimization

Your problem can be conveniently re-written as \begin{eqnarray} \underset{x \in \mathbb{R}^K}{\text{min }}f(x) + g(x), \end{eqnarray} where $f: x \mapsto \frac{1}{2}\|Ax-b\|^2$ and $g = i_{\mathbb{R}^K_+}$, the indicator function (in the convex analytic sense) of the nonnegative $K$th orthant. $f$ is smooth with Lipschitz gradient ($\|A\|^2$ is a possible Lipschitz constant) while $g$ has a simple proximal operator $prox_g(x) := (x)_+$ (the orthogonal projector unto the aforementioned orthant). So, proximal methods like FISTA are your friend.

In your "main problem", the aforementioned orthant is simply replaced with the standard simplex. The projector unto this simplex, though inaccessible in closed form, can be computed very cheaply using (for example) the simple algorithm presented in section 3 of the paper http://www.magicbroom.info/Papers/DuchiShSiCh08.pdf.

The code can be implemented in 3 lines of Python:

import numpy as np


def proj_simplex(v, z=1.):
    """Projects v unto the simplex {x >= 0, x_0 + x_1 + ... x_n = z}.

    The method is John Duchi's O (n log n) Algorithm 1.
    """
    # deterministic O(n log n)
    u = np.sort(v)[::-1]  # sort v in increasing order
    aux = (np.cumsum(u) - z) / np.arange(1., len(v) + 1.)
    return np.maximum(v - aux[np.nonzero(u > aux)[0][-1]], 0.)

BTW, what is the proximal operator of an "arbitrary" convex function $g$ ?

Formally, \begin{eqnarray} prox_g(x) := \underset{p \in \mathbb{R}^K}{\text{argmin }}\|p-x\|^2 + g(p). \end{eqnarray}

"Proximable" functions (i.e functions for which the argmin problem in the definition above are easy to solve, for any point $x$) play just as important a rule as differentiable functions. The proximal operator lets you make "implicit gradient steps". Indeed, one has the characterization \begin{eqnarray}p = prox_g(x)\text{ iff } x - p \in \partial g(p), \end{eqnarray} where \begin{eqnarray}\partial g(p) := \{u \in \mathbb{R}^K | g(q) \ge g(p) + \langle u, q - p\rangle \forall q \in \mathbb{R}^K\}\end{eqnarray} is the subdifferential of $g$ at $p$ (this reduces to the singleton $\{\nabla g(p)\}$ if $g$ is differentiable at $p$). In your problem(s) above, the proximal operator happens to be a projection operator. In fact for any closed convex subset $C \subseteq \mathbb{R}^K$, a little algebra reveals that \begin{eqnarray} prox_{i_C}(x) := \underset{p \in \mathbb{R}^K}{\text{argmin }}\|p-x\|^2 + i_C(p) = \underset{p \in C}{\text{argmin }}\|p-x\|^2 =: proj_C(x), \end{eqnarray} where $i_C$ is the indicator function of $C$ defined by $i_C(x) := 0$ if $x \in C$; $+\infty$ otherwise. A less trivial example is the $\ell_1$-norm $\|.\|_1$ whose proximal operator (at rank $\gamma > 0$) is the so-called soft-thresholding operator $prox_{\gamma\|.\|_1}(x) = soft_\gamma(x) = (v_k)_{1\le k \le K}$, where \begin{eqnarray} v_k := \left(1- \dfrac{\gamma}{|x_k|}\right)_+x_k. \end{eqnarray}

Proximal operators are a handy tool in modern convex analysis. They find great use in problems arising in signal processing, game theory, machine learning, etc. Here is a nice place to start learning about proximal operators and similar objects: http://arxiv.org/pdf/0912.3522.pdf.

Most importantly, "under mild conditions" one can show (see the previous reference) that a point $x^*$ minimizes $f + g$ iff

\begin{eqnarray} x^* = prox_{\gamma g}(x^* - \gamma \nabla f(x^*)), \forall \gamma > 0 \end{eqnarray}

Thus the minimizers of $f + g$ coincide with the fixed-points of the operators $prox_{\gamma g}\circ(Id - \gamma \nabla f)$, $\gamma > 0$. This suggests the following algorithm, known as the forward-backward algorithm (Mureau; Lions and Mercier; P.L Combettes et al.) \begin{eqnarray} x^{(n+1)} = \underbrace{prox_{\gamma_n g}}_{\text{backward / prox step}}\underbrace{(x^{(n)} - \gamma_n \nabla f(x^{(n)}))}_{\text{forward / gradient step}}, \end{eqnarray}

for an appropriately chosen sequence of step-sizes $(\gamma_n)_{n \in \mathbb{N}}$.

If $g$ is constant, so that it suffices to minimize $f$ alone, then the above iterates become \begin{eqnarray} x^{(n+1)} = x^{(n)} - \gamma_n \nabla f(x^{(n)}), \end{eqnarray}

and we recognize our old friend, the gradient descent algorithm, taught in high school.

N.B.: $(x)_+$ denotes the componentwise maximum of $x$ and $0$.

What does it mean to minimize a convex function with “less than or equal to” inequality constraints? Why

Notice that $x^2-1$ is a convex function. If we consider $x^2-1 \ge 0$, we have $x \ge 1$ or $x \le -1$. That is the feasible set is not a convex set.

However, notice that $x^2-1 \le 0$ is equivalent to $-1 \le x \le 1$ is convex.

In general, we know that $\{ x \mid f_i(x) \le 0\}$ is a convex set and their intersection, that is the feasible set that you have written down is a convex set. It is a desirable property to minimize a convex objective function over a convex set, in particular, we know that a local minimum is a global minimum.

Also, notice that the first example is a special case of the general form.

$$\begin{align} \text{max } x_1 & +5 x_2 \\ \text{s.t. } x_1 & \le 150 \\ x_2 &\le 350 \\ x_1+x_2 &\le 400 \\ x_1 , x_2 &\ge 0 \end{align}$$

Is actually equivalent to

$$\begin{align}- \min -x_1 & -5 x_2 \\ \text{s.t. } x_1 & \le 150 \\ x_2 &\le 350 \\ x_1+x_2 &\le 400 \\ -x_1 &\le 0 \\ -x_2 &\le 0 \end{align}$$

Here $f_0$ is just $-x_1-5x_2$ and the $f_i$ are just the LHS of the constraint.

We can flip the inequality by multiplying a negative sign and in fact the general form even include the first form. The general form doesn't tell us whether $x_i$ is bounded above or below since linear functions are convex and we can flip the inequality signs. The crucial propery here is convexity.

For the follow up question:

Even if $f_i$ is not convex, it is still possible for the feasible region to be bounded. In fact, in special cases, it is even possible for it to be convex. As an example

Consider $$x^3-1 \le 0$$

Even thought the function $x^3-1$ is not convex, the feasible region is just $x \le 1$.

Imposing $f_i$ to be convex explicitly gives us a convenient way to construct a convex set and also use its properties in deriving algorithms or investigate property of this form of problems.

Best Answer

Related Solutions

Least Squares Problem with Non-Negativity Constraints – Optimization

What does it mean to minimize a convex function with “less than or equal to” inequality constraints? Why

Related Question