[Math] Minimize sum of $\ell_2$ norm and linear combination, on simplex

convex optimizationconvex-analysisconvex-geometry

Let $\Delta_n := \{x \in \mathbb{R}^n | x \ge 0, \sum_{1 \le i \le n}x_i = 1\}$ be the $n$-simplex. For $a, b \in \mathbb R^n$, with $\Delta_n \not \ni a$, consider the problem of computing the following value exactly
$$\alpha := \min_{x \in \Delta_n}\|x-a\| + \langle b,x\rangle.$$
The case when $b = 0$ corresponds to the problem of projecting the point $a$ unto $\Delta_n$. This special case is well-known to be exactly solvable in $\mathcal{O}(n)$ flops.

Now, for the general problem, it's not hard to rewrite

\begin{equation}
\begin{split}
\alpha &:= \min_{x \in \Delta_n}\max_{\|y\| \le 1} \langle y, x – a \rangle + \langle b, x\rangle = \max_{\|y\| \le 1}\left(\min_{x \in \Delta_n}\langle y + b, x\rangle\right) – \langle y, a\rangle\\
&= \max_{\|y\| \le 1}\left(\min_{1 \le i \le n}y_i + b_i\right) – \langle y, a \rangle = \max_{t \in \mathbb R}t – \min_{\|y\|^2 \le 1,\;y_i \ge t – b_i \forall i}\langle y,a\rangle.
\end{split}
\end{equation}

Question: Given $c \in \mathbb R^n$ ($c_i \equiv b_i – t$), can one compute the value of $$\min_{\|y\|^2 \le 1,\;y \le c}-\langle a,y\rangle$$
analytically ? For which values of $c$ does the latter problem have a solution ?

No polynomial time algorithm for exact solution ?

Observation: In low dimensional cases ($n = 1, 2, 3$), when non-degenerate, it's not hard to sketch that this problem has solutions which are piece-wise polynomials (or square roots of such) in the $a_i$'s and $c_i$'s, the number of pieces being in the order of $2^n$.


Update: Algorithm based on @fedja's answer + proof of Q-linear convergence in the "small perturbation" regime

For generality, let $a \in \mathbb R^n$ and $C$ be a "simple" (to be clarified) closed convex subset of
$\mathbb R^n$ not containing the point $a$. Consider the problem
\begin{equation}
\text{minimize } \|x – a\| + \langle b, x\rangle\text{ subject to }x \in C.
\end{equation}

Fedja's idea.
The idea is to introduce a radial variable $r := \|x-a\|^{-1}$. Indeed,
using the well-known elementary inequality
\begin{equation}
t + t^{-1} \ge 2\; \forall t > 0,\text{ with equality iff } t = 1,
\end{equation}
it follows that $\forall x \in \mathbb{R}^n$ and $\forall r > 0$, we
have $$\|x-a\| \le \frac{1}{2}(r\|x-a\|^2 + r^{-1}),$$
and this bound is attained at $r = \|x-a\|^{-1}.$
Thus completing the square in $x$, the optimal value $\alpha$ for the problem can be rewritten in the form
\begin{equation}
2\alpha- b^Ta = \min_{r > 0,x \in C}r\|x – (a + r^{-1}b)\|^2 +
(1-b^Tb)r^{-1}.
\end{equation}

The algorithm.
Based on Fedja's idea of introducing the radial variable $r =
\|x-a\|^{-1}$, the following alternating iterative scheme
for solving the above problem exactly, is natural
\begin{equation}
x^{(k + 1)} = \mathrm{proj}_C(a + b / r^{(k)}),\; r^{(k+1)} := \|x^{(k + 1)} –
a\|^{-1},
\end{equation}
with $r^{(0)} > 0$.
Next, we will proof some interesting things regarding the numerics for
the algorithm so-obtained.

Q-linear convergence in the "small perturbation" regime: $\|b\| < 1$.
Eliminating the $x$ variable from the above iterates, we see that the
$r$-update can be rewritten as a Picard process
\begin{equation}r^{(k+1)} = \|\mathrm{proj}_C(a + b/r^{(k)})
– a\|^{-1}.
\end{equation}
Let's proof that this process converges after essentially $\mathcal
O(1)$ rounds.
This would mean that the cost of solving the "$b \ne 0$" problem is
essentially the cost of projecting onto $C$, i.e the cost of
solving the "$b=0$" problem.

Assumption.
Let's make the following minimal assumption: There is an oracle
for exactly projecting onto $C$
.

For example, this assumption holds for polyhedra, $\ell_p$ balls,
etc. In order to only concentrate on the "heart of the issue'', let's
take for granted that the above process has a fixed-point $r^{(*)} > 0$. Now,

\begin{eqnarray*}
\begin{split}
\left|\frac{1}{r^{(k+1)}} – \frac{1}{r^{(*)}}\right| &= \Big|\|\mathrm{proj}_{
C}(a +
b/r^{(k)}) – a\| -\|\mathrm{proj}_{C}(a +
b/r^{(*)}) – a\|\big|\\
&\le \|\mathrm{proj}_{C}(a + b/r^{(k)}) –
\mathrm{proj}_{C}(a + b/r^{(*)})\| \\
&\le \|b/r^{(k)} – b/r^{(*)}\| =
\|b\|\Big|\frac{1}{r^{(k)}} – \frac{1}{r^{(*)}}\big|,
\end{split}
\end{eqnarray*}
where the first inequality is the "reverse triangle inequality'' and
the second follows from the nonexpansivity of the projection operator onto
a closed convex set. Thus if $\|b\| < 1$, then the residuals $\Big|\frac{1}{r^{(k)}} – \frac{1}{r^{(*)}}\Big|$ decay exponentially
fast (i.e Q-linearly) at rate $\|b\|$.

Best Answer

I would try to approach your original problem a bit differently. Note that $|x-a|\le \frac 12(r|x-a|^2+r^{-1})$ and the equality is attained for $r=|x-a|^{-1}$. Thus, $$ \min_x[|x-a|+\langle b,x\rangle]=\frac 12\min_r\min_x[r|x-a+r^{-1}b|^2+r^{-1}(1-|b|^2)]+\langle a,b\rangle $$ However, the inner minimum can be now found rather quickly and finding the outer one is a one-dimensional problem. Moreover, I suspect that even the naive algorithm of choosing $r$ arbitrarily to start with, finding the corresponding $x$, and then changing $r$ to something stupid like $(|x-a|^{-1}+r)/2$ has a decent chance to work but checking that would require some accurate analysis of the properties of the inner minimum as a function of $r$. What is true, however, is that if the minimizer $x$ for some $r$ satisfies $|x-a|=r^{-1}$, then you have the true minimum for the original expression, so, at least, you can recognize the solution when you see it this way.

Edit:

I thought it might make sense to add a few more remarks to what I wrote already.

Let's look at what we have so far and see if we can convert it into something that is guaranteed to work. Put $\rho=r^{-1}$. Then the main idea was to observe that instead of the original minimization problem, we can solve the problem $$ \Psi(x,\rho)=\frac 12\left[\frac{|x-a|^2}\rho+\rho\right]+\langle b,x\rangle\to\min $$ and that in this new problem the minimization in $x$ is equivalent to finding the closest point to $a-\rho b$.

Since $\Psi$ is jointly convex in $x,\rho$ and the best $\rho$ has physical meaning of the distance from $a$ to $x$, the outer minimization problem is just the problem of finding the minimum of a convex function over a finite interval of length about the diameter of the convex body $C$ we project upon, so, even if the iterations fail, we can always resort to the good old bisection, which is guaranteed to give us the answer with the precision about $2^{-m/2}|b|\operatorname{diam} C$ where $m$ is the number of steps.

This is not an "exact" solution, of course. However, if you implement an exact algorithm on any real machine, you'll still never be able to get anything better than the machine precision. So, I'm not really sure how much sense the request for an exact solution makes here from the purely practical viewpoint. Still, notice that the closest point to $a-\rho b$ moves along a straight line as long as we stay in the relative interior of the same $k$-dimensional face of the simplex. Unfortunately, this gives us exponentially many lines, so just tracing all of them is impractical. However, we can always first run the bisection for a while until we reduce the situation to a sufficiently small interval, after which we can use only the lines that correspond to the points on that interval and, in principle, we can try to trace them. The ideal scenario is that after a few iterations in the bisection, we get the values of $x$ lying in the relative interior of the same $k$-dimensional face for both endpoints of the remaining interval, in which case we have just one line left and we can finish in a single step. It is not guaranteed to always happen, of course, but I suspect that it may work more often than not.

Related Question