[Math] Linear Least Squares with Non Negativity Constraint

convex optimizationkarush kuhn tuckerleast squaresoptimizationquadratic programming

I am interested in the linear least squares problem:
$$\min_x \|Ax-b\|^2$$

Without constraint, the problem can be directly solved. With an additional linear equality constraint, the problem can be directly solved too, thanks to a Lagrange multiplier.

However, I could not find a direct way to solve this problem with a linear inequality constraint. The problem belongs to quadratic programming, and the methods mentioned on Wikipedia involve an iterative procedure. I also know about Karush–Kuhn–Tucker conditions, but I cannot deal with them in this particular problem since the primal and dual feasability conditions, and the complementary slackness conditions, are numerous and not helpful in an abstract setting.


Let us assume the linear inequality constraint is indeed a simple enforcement of non-negativity:
$$x\geq0$$

Is there a direct method which could be directly applied to this simpler case? The only hope I could find so far lies in the method explained in a technical report by Gene H. Golub and Michael A. Saunders, released in May 1969, called Linear Least Squares and Quadratic Programming, and which was linked in another question.

Best Answer

You are asking about the nonnegative least squares problem. Matlab has a function to solve this type of problem. Here is a paper about an algorithm to solve nonnegative least squares problems:

Arthur Szlam, Zhaohui Guo and Stanley Osher, A Split Bregman Method for Non-Negative Sparsity Penalized Least Squares with Applications to Hyperspectral Demixing, February 2010

I also think the proximal gradient method can be used to solve it pretty efficiently.


Here are some details about how to solve this problem using the proximal gradient method. The proximal gradient method solves optimization problems of the form \begin{equation} \text{minimize} \quad f(x) + g(x) \end{equation} where $f:\mathbb R^n \to \mathbb R$ is convex and differentiable (with a Lipschitz continuous gradient) and $g:\mathbb R^n \to \mathbb R \cup \{\infty\}$ is convex. (We also require that $g$ is lower semi-continuous, which is a mild assumption that is usually satisfied in practice.) The optimization variable is $x \in \mathbb R^n$. The nonnegative least squares problem has this this form where $$ f(x) = \frac12\| Ax - b \|_2^2 $$ and $g$ is the convex indicator function for the nonnegative orthant: $$ g(x) = \begin{cases} 0 & \quad \text{if } x \geq 0, \\ \infty & \quad \text{otherwise.} \end{cases} $$ The proximal gradient iteration is $$ x^{k+1} = \text{prox}_{tg}(x^k-t \nabla f(x^k)). $$ Here $t > 0$ is a fixed step size which is chosen in advance, and which should satisfy $t \leq 1/L$ (where $L$ is a Lipschitz constant for $\nabla f$). The prox-operator of $g$ is defined by $$ \text{prox}_{tg}(x) = \text{arg}\,\min_u \, g(u) + \frac{1}{2t} \| u - x \|_2^2. $$ For our specific choice of $g$, you can see that $\text{prox}_{tg}(x) = \max(x,0)$, the projection of $x$ onto the nonnegative orthant. The gradient of $f$ is given by $$ \nabla f(x) = A^T (Ax - b). $$

So, it's as simple as that. There is also a line search version of the proximal gradient method which I recommend using. The line search version adjusts the step size $t$ adaptively at each iteration, and it often converges much faster than the fixed step size version.

Related Question