Solve Large Scale Linear Least Squares with Simplex Constraints

algorithmsconvex optimizationMATLABoperations researchoptimization

I need to solve a rather simple optimization problem, however the structure of the problem makes out-of-the-box commercial solvers fail. The problem is:\begin{aligned}
&\min_{x_i}\|\sum_{i=1}^{10}x_i\mathbf{a}_i – \mathbf{b}\|_2^2 \\
&s.t.: \sum_{i=1}^{10}x_i = 1,\; x_i\geq0
\end{aligned}

The decision variables are $x_i$, which are scalars, and typically the problem will have 10 variables. The complexity comes from the fact that $\mathbf{a}_i, \mathbf{b}$ are VERY LONG constant vectors, varying from 10,000 to 100,000 elements and possibly more. Without going too much into details, black-box solvers fail since they impute the variable over vectors, thus converting a 10-D problem to 100K-vars problem.

Since the problem has a simple structure (only complicated by the constant vectors), I think that interior point algorithms can perform well here. Also, the Hessian is very simple and easy to compute – it's $ij$-th entry is simply $\langle \mathbf{a}_i, \mathbf{a}_j \rangle$, with size $10\times 10$. I'm very familiar with optimization, however my experience is mostly with first-order algorithms. I'm almost clueless when it come to IPMs and not sure how to implement an appropriate algorithm. Does anyhow have a ready script I can use? Or any other tips that might help?

Best Answer

The problem is given by:

\begin{aligned} &\min_{x_i}\|\sum_{i=1}^{N}x_i\mathbf{a}_i - \mathbf{b}\|_2^2 \\ &s.t.: \sum_{i=1}^{N}x_i = 1,\; x_i\geq0 \end{aligned}

The problem is equivalent to:

$$ \begin{alignat*}{3} \arg \min_{x} & \quad & \left\| A \boldsymbol{x} - \hat{\boldsymbol{b}} \right\|_{2}^{2} \\ \text{subject to} & \quad & {x}_{i} \geq 0 \\ & \quad & \boldsymbol{x}^{T} \boldsymbol{1} = 1 \end{alignat*} $$

Where $ A = \left[ \boldsymbol{a}_{1}, \boldsymbol{a}_{2}, \ldots \right] $ and $ \hat{\boldsymbol{b}} = N \boldsymbol{b} $.

This is basically non negative least squares with Linear Equality Constraints.
You could easily solve it with many Least Squares solvers.

You could also utilize Orthogonal Projection onto the Unit Simplex with some acceleration (FISTA like) in the Projected Gradient Descent framework and have low memory and pretty fast solver even for large size problem.

When using the Projected Gradient Descent you should pre calculate $ {A}^{T} A $ and $ {A}^{T} \hat{\boldsymbol{b}} $ then the calculation per loop should be pretty small.

Related Question