Let us first agree on the problem you are trying to solve. I assume you are looking at:
\begin{align*}
\min_{w}~&f(w):=\frac{1}{2} \Vert y-Xw\Vert +\frac{\lambda}{2}w^tw\\
s.t.~& w\geq 0
\end{align*}
where $y=\begin{pmatrix}y_1\\ \vdots \\y_n\end{pmatrix}$ contains the observations, $X=\begin{pmatrix}
\leftarrow & x_1 & \rightarrow\\
&\vdots&\\
\leftarrow & x_n & \rightarrow
\end{pmatrix}$ the feature vectors and $w=\begin{pmatrix}w_1\\ \vdots \\ w_m \end{pmatrix}$ the weights.
The simplest algorithm you could use to solve the problem is a projected gradient method. Consider the gradient of the objective function at a point $w_k$:
\begin{equation*}
\nabla_w f(w_k)=(X^TX+\lambda I)w_k-X^Ty
\end{equation*}
and let $\Pi_C$ be the projection onto the set $C:=\{w:~w\geq 0\}$, specifically:
$\Pi_C(w)=\begin{pmatrix}\tilde{w}_1\\ \vdots \\ \tilde{w}_m \end{pmatrix}$, where $\tilde{w}_i=\begin{cases}
w_i &\text{if }w_i\geq 0\\
0 &\text{else}
\end{cases}$.
The projected gradient algorithm works just like the steepest descent algorithm with the addition of a projection step after the gradient step:
- Start with a feasible $w_0$ and let $k:=1$,
- at iteration $k$ let:
\begin{align*}
y_k&=w_k -\alpha \nabla_wf(w_k)\\
w_{k}&=\Pi_C(y_k)\\
k&=k+1
\end{align*}
where $\alpha$ is your step length, found for example using a line search. Repeat this step until you meet your desired stopping criterion.
Look into gradient projection methods to find something more elaborate, but this should do the trick.
The problem is given by:
$$\begin{aligned}
\arg \min_{x} \quad & \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} + \gamma {\left\| x \right\|}_{1} \\
\text{subject to} \quad & \boldsymbol{1}^{T} x = b
\end{aligned}$$
The Lagrangian is given by:
$$ L \left( x, \beta \right) = \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} + \gamma {\left\| x \right\|}_{1} + \beta \left( \boldsymbol{1}^{T} x - b \right) $$
The solution must obey KKT Conditions (The problem is Convex and Slater Conditions are satisfied) which are given by:
$$\begin{align*}
\nabla L \left( x, \beta \right) = x - y + \gamma \partial {\left\| x \right\|}_{1} + \beta \boldsymbol{1} & = 0 & \text{(1)} \\
\boldsymbol{1}^{T} x & = b & \text{(2)} \\
\end{align*}$$
From (1) is is clear that $ x = \operatorname{Prox}_{ \gamma {\left\| \cdot \right\|}_{1} } \left( y - \beta \boldsymbol{1} \right) $ (As a shifted solution to the $ {L}_{1} $ Prox). Using (2) one would get:
$$ \boldsymbol{1}^{T} \operatorname{Prox}_{ \gamma {\left\| \cdot \right\|}_{1} } \left( y - \beta \boldsymbol{1} \right) = b \Rightarrow b - \boldsymbol{1}^{T} \operatorname{Prox}_{ \gamma {\left\| \cdot \right\|}_{1} } \left( y - \beta \boldsymbol{1} \right) = 0 $$
Using the explicit solution of $ \operatorname{Prox}_{ \gamma {\left\| \cdot \right\|}_{1} } \left( \cdot \right) $ as the Soft Threshold operator yields:
$$ g \left( \beta \right) = \sum_{i = 1}^{n} \operatorname{sign} \left( {y}_{i} - \beta \right) { \left( \left| {y}_{i} - \beta \right| - \gamma \right) }_{+} - b $$
Now the problem becomes finding the root of the function $ g \left( \beta \right) $ and the plug it into the Prox.
This is (The function $ g \left( \beta \right) $) a monotonic decreasing function of $ \beta $ which we're looking for its zero (Which solves the problem).
One could solve this within any 1D solver within the range $ \left[ \max(y) + b, \min(y) - b \right] $ which must contain zero.
More efficient approach is based on the fast this is a Piece Wise Linear function (Of $ \beta $) with break points at $ \left| {y}_{i} - \beta \right| = \gamma $ (Hence there are $ 2 n $ points).
Hence by utilizing Bi-Section method which moves the points within the sorted $ 2 n $ break points one could easily find the section which the function has constant slope within it and have the zero value in it.
Let's mark this section by $ \left[ {\beta}_{min}, {\beta}_{max} \right] $, then $ \forall {\beta}_{i}, {\beta}_{j} \in \left[ {\beta}_{min}, {\beta}_{max} \right] : \; \operatorname{sign} \left( y - {\beta}_{i} \right) = \operatorname{sign} \left( y - {\beta}_{j} \right) = e $. This also implies that the support, $ \mathcal{S} = \left\{ i \mid { \left( \left| {y}_{i} - \beta \right| - \gamma \right) }_{+} \neq 0 \right\} $ is constant within this section (As otherwise a new break point would be created).
This means the equation becomes:
$$\begin{aligned}
0 & = \sum_{i \in \mathcal{S}} {e}_{i} \left( \left| {y}_{i} - \beta \right| - \gamma \right) - b & \text{} \\
& = \sum_{i \in \mathcal{S}} {y}_{i} - \beta - \gamma {e}_{i} - b & \text{ $ {e}_{i} \left| {y}_{i} - \beta \right| = \operatorname{sign} \left( {y}_{i} - \beta \right) \left| {y}_{i} - \beta \right|= {y}_{i} - \beta $ } \\
& \Rightarrow \sum_{i \in \mathcal{S}} \beta = \sum_{i \in \mathcal{S}} {y}_{i} - \sum_{i \in \mathcal{S}} \gamma {e}_{i} - b & \text{} \\
& \Rightarrow \beta = \frac{1}{ \left| \mathcal{S} \right| } \left( \sum_{i \in \mathcal{S}} \left( {y}_{i} - \gamma {e}_{i} \right) - b \right)
\end{aligned}$$
The full MATLAB code with CVX validation is available in my StackExchnage Mathematics Q2886713 GitHub Repository.
Remark
Knowing the solution to the above problem (Deriving the Prox Operator) gives us an efficient method (Proximal Gradient Descent) to solve:
$$\begin{aligned}
\arg \min_{x} \quad & \frac{1}{2} {\left\| A x - b \right\|}_{2}^{2} + \gamma {\left\| x \right\|}_{1} \\
\text{subject to} \quad & \boldsymbol{1}^{T} x = b
\end{aligned}$$
With the ADMM framework we can even tackle more general cases.
Reference
The solution is based on Efficient Solvers for Sparse Subspace Clustering.
Best Answer
This is pretty simple variation of the LASSO as the constraint is easy to solve (Namely its projection has closed form solution which is simple).
Let's define $ f \left( x \right) = \frac{1}{2} \left\| A x - b \right\|_{2}^{2} + \lambda \left\| x \right\|_{1} $.
I solved it using 3 methods.
Projected Sub Gradient Method
The Sub Gradient is given by:
$$ \partial f \left( x \right) = {A}^{T} \left( A x - b \right) + \lambda \operatorname{sign} \left( x \right) $$
The projection onto $ \mathbb{R}_{+} $ is given by:
$$ \operatorname{Proj}_{ \mathbb{R}_{+} } \left( x \right) = \max \left\{ x, 0 \right\} $$
The iteration is simple:
$$ {x}^{k + 1} = \operatorname{Proj}_{ \mathbb{R}_{+} } \left( {x}^{k} - \alpha \partial f \left( x \right) \right) $$
Where $ \alpha $ is the step size of the algorithm.
Proximal Gradient Method
The Proximal Gradient Method for LASSO is well known.
One could see it as a generalization of the Gradient Method.
Hence by adding the Projection Step as above into its iteration solves the problem (Faster) as well.
ADMM
There is a large degree of freedom to solve this using the ADMM Framework.
The approach I did was just a small alteration to the classic solution for the LASSO using ADMM. I just applied the projection on the variable which is being threshold.
It doesn't work perfectly (The Objective Function isn't decreasing monotonically) yet it works really nicely and the idea was form the background of ADMM (Dual Decompositions).
This is the result I got validated against CVX:
Remarks
The code is available at my StackExchange Mathematics 2706108 GitHub Repository.