Minimize Augmented Lagrangian Function in ADMM for Lasso Problem – Solving ADMM Sub Problems

convex optimizationoptimizationproximal-operators

The lasso problem for ADMM can be written as follow:

\begin{equation}
\underset{x}{\min}
\frac{1}{2}\|{Ax-b}\|^2_2+\lambda\|{z}\|_1, \text{subject to } x-z=0
\end{equation}

The Augmented Lagrangian for this minimization is

\begin{equation}
F(x,z,u)=\Bigl(\tfrac{1}{2}\|{Ax-b}\|^2_2+\lambda\|{z}\|_1+u^T(x-z)+\frac{\rho}{2}\|x-z\|^2_2\Bigr)
\end{equation}

then iterations for $x,z,u$ is done with minimizing $F$ with respect to $x,z$. Could you show me how we do the minimization for $x,z$ and get the following results. (I think the gradient of the $F$ is equal to zero is applied but I could not see how to get the solutions)

\begin{equation}
\begin{split}
&x^{k+1}=(A^TA+\rho I)^{-1}(A^Tb+\rho(z^k-u^k)) \\
&z^{k+1}=S_{\lambda/\rho}(x^{k+1}+u^k) \\
&u^{k+1}=u^k+x^{k+1}-z^{k+1}
\end{split}
\end{equation}

Best Answer

ADMM steps (from https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf) can be

\begin{align*} {{x}} &\leftarrow \arg\min_{{{x}} } f \left( {{x}} \right) + {{u}} ^T \left( {{{x}} } - {{z}} \right) + \frac{\rho}{2} \left\| {{{x}} } - {{z}} \right\|_2^2 \\ &\equiv \arg\min_{{{x}} } f \left( {{x}} \right) + \frac{\rho}{2} \left\| {{{x}} } - {{z}} + {u} \right\|_2^2 \\ {{z}} &\leftarrow \arg\min_{{{z}}} g\left( {{z}} \right) + {{u}} ^T \left( {{{x}} } - {{z}} \right) + \frac{\rho}{2} \left\| {{{y}} } - {{x}} \right\|_2^2 \\ &\equiv \arg\min_{{{{z}}}} g\left( {{z}} \right) + \frac{\rho}{2} \left\| {{{x}} } - {{z}} + {u} \right\|_2^2 \\ {{u}} &\leftarrow {{u}} + \left( {{{x}} } - {{z}} \right) \end{align*}

Let us say $f(x) = \frac{1}{2} \|A x - b \|_2^2$ and $g(z) = \lambda \| z\|_1$. We can exploit proximal operator, that is,

Definition. Let $f: {\rm dom}_f \mapsto \left(-\infty\right., \left. +\infty \right]$ be a closed convex proper function, then \begin{align*} {\rm prox}_{\lambda f}\left( x\right) := \left({I} + \lambda \partial f \right)^{-1} \left( x \right) = \arg\min_{u \in {\rm dom}_f} \left\{ f\left({u}\right) + \frac{1}{ 2\lambda} \left\|x - u \right\|_2^2\right\} . \end{align*}

Also, define for brevity (we can use the equivalent scaled-form ), $$F(x) := f \left( {{x}} \right) + \frac{\rho}{2} \left\| {{{x}} } - {{z}} + {u} \right\|_2^2$$

$$G(z) := g \left( {{z}} \right) + \frac{\rho}{2} \left\| {{{x}} } - {{z}} + {u} \right\|_2^2 .$$

Now, just find the gradients and set them to zero, that is,

$$\frac{\partial F(x)}{\partial x} = 0 \Longleftrightarrow \frac{1}{\rho}\partial f(x) + \left(x - z + u \right) = 0 \Longleftrightarrow x = \left(I + \frac{1}{\rho} \partial f \right)^{-1} \left( z - u\right) = \operatorname{prox}_{\frac{1}{\rho} f}\left( z - u\right)$$

and

$$\frac{\partial G(z)}{\partial z} = 0 \Longleftrightarrow z = \left(I + \frac{1}{\rho} \partial g \right)^{-1} \left( x + u\right) = \operatorname{prox}_{\frac{1}{\rho} g}\left( x + u\right).$$

Thus, the ADMM iterative steps are \begin{align*} {{x}^{k+1}} &:= \operatorname{prox}_{\frac{1}{\rho}f}\left( z^{k} - u^{k} \right) \\ {{z}^{k+1}} &:= \operatorname{prox}_{\frac{1}{\rho}g}\left( {{x}^{k+1}} + u^{k} \right) \\ {{u}^{k+1}} &:= {{u}^k} + \left( {{x}^{k+1}} - {{z}^{k+1}} \right) \end{align*}

Now, you can use the prox operators for both affine $f(x)$ and L1 norm $g(z)$.


Appendix

The prox operators for $f(x) = \frac{1}{2} \|A x - b \|_2^2$ and $g(z) = \lambda \| z\|_1$ are given below.

\begin{align} \operatorname{prox}_{\lambda f}\left( x \right) &= \arg\min_{v} \left\{ \frac{1}{2} \|A v - b \|_2^2 + \frac{1}{ 2 \lambda} \left\|x - v \right\|_2^2\right\} \\ \Longrightarrow 0&= A^T\left( Av - b \right) + \left(-\frac{1}{ \lambda} \left( x - v \right) \right) \\ \Longleftrightarrow 0&= \left(A^TA + \frac{1}{ \lambda}I \right)v - \left(A^Tb + \frac{1}{ \lambda} x \right)\\ \Longleftrightarrow v&= \operatorname{prox}_{\lambda f}\left( x \right) = \left(A^TA + \frac{1}{ \lambda} I \right)^{-1}\left(A^Tb + \frac{1}{ \lambda} x \right). \end{align}

\begin{align} \operatorname{prox}_{\lambda g}\left( z \right) &= \arg\min_{v} \left\{ \lambda \| v\|_1 + \frac{1}{ 2} \left\|z - v \right\|_2^2\right\} \\ &= \arg\min_{ \left\{v_i\right\}} \left\{ \sum_i \lambda|v_i| + \frac{1}{ 2} \sum_i \left\|z_i - v_i \right\|_2^2\right\} \end{align} Since the problem is separable, then you can use KKT conditions to obtain so-called soft thresholding operator. Not to make this post too long, I can refer to you for instance this The Proximal Operator of the $ {L}_{1} $ Norm Function which shows the derivation.

I hope this helps you.

Related Question