Solved – Shrinkage operator for elastic net regularization

elastic netglmnetmachine learningoptimizationregularization

Consider the elastic net problem

$$\min_x f(x) + \lambda_1 \Vert x \Vert_1 + \lambda_2 \Vert x \Vert_2^2$$

Is there a shrinkage operator for this objective function, similar to the soft thresholding operator for L1 regularization (which in this case would be $\text{sgn}(x) (\vert x \vert – \lambda_1)_+$)?

To elaborate:

In glmnet, for linear regression, given a design matrix $X \in \mathbb{R}^{n \times p}$ and a vector of observations $y \in \mathbb{R}^n$, the elastic net problem is of the form

$$\min_{\beta \in \mathbb{R}^p} \frac{1}{2n}\Vert y – X \beta \Vert_2^2 + \alpha \lambda \Vert \beta \Vert_1 + (1-\alpha)\frac{\lambda}{2}\Vert \beta \Vert_2^2$$

where $\alpha \in [0,1]$ is the elastic net parameter. When $\alpha = 1$ this is clearly equivalent to lasso linear regression, in which case the proximal operator for L1 regularization is soft thresholding, i.e.

$$\text{prox}_{\lambda \Vert \cdot \Vert_1}(v) = \text{sgn}(v)(\vert v \vert – \lambda)_+$$

My question is: When $\alpha \in [0,1)$, what is the form of $\text{prox}_{\alpha\lambda\Vert\cdot\Vert_1 + (1-\alpha)\frac{\lambda}{2}\Vert \cdot\Vert_2^2}$ ?

Best Answer

The two tutorials, Ryu et al. 2016 and Parikh et al. 2013 give a nice overview of operator splitting methods involving gradient operator, proximal operator. Proximal operator is a special case of resolvent operator.

The resolvent of a relation A on $\mathbb{R}^n$ is defined as $$ R = (I-\lambda A)^{-1} $$

where $I$ is the identity function.

$\text{prox}_f(x)$, the proximal operator of function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is defined as $$ \text{prox}_{\lambda f}(x) = \arg\min_z(\lambda f(z) + 1/2\|z-x\|_2^2) $$

Here, we only consider $f$ is convex. We can reduce $\text{prox}_{\lambda f}(x)$ to $R_{\lambda \partial f}(x)$ by: let $z \in \text{prox}_{\lambda f}(x)$ \begin{align} z &= \arg \min_z \lambda f(z)+ 1/2 \|z-x\|_2^2 \\ 0 &\in \lambda\partial f(z) + z - x \\ x &\in \lambda\partial f(z) + z \\ z &= (I + \lambda \partial f)^{-1}(x) \end{align}

A nice property for proximal operator is that iteratively solving for $\text{prox}_{\partial f}(x)$ has linear convergence rate for strongly convex function. For $\alpha \neq 1$, the elastic net objective function is $(1-\alpha)\lambda/2$-strongly convex.

Applying the similar trick to the elastic net objective function, let $z \in prox_{elastic}(\beta)$, \begin{align} z &= \arg \min_z 1/2n\|Y - Xz\|_2^2 + \alpha \lambda\|z\|_1 + (1-\alpha)\lambda/2 \|z\|_2^2 + 1/2\|z -\beta\|_2^2 \\ 0 &\in \frac{X^TXz - Y^TX}{n} + \alpha\lambda \text{sgn}(z) + (1-\alpha)\lambda z+ z - \beta \end{align}

There is no closed form for $z$, unless we impose orthogonal design $X^TX = I$. Then, the first order condition is separable for different coordinates. \begin{align} 0 &\in z_i/n - (Y^TX)_i/n + \alpha \lambda \text{sgn}(z_i) + (1-\alpha)\lambda z_i + z_i -\beta_i \\ \beta_i - ((1-\alpha)\lambda + 1+ 1/n)z_i + (Y^TX)_i&\in \alpha\lambda \text{sgn}(z_i) \end{align} let $t_i = \frac{\beta_i + (Y^TX)_i }{(1-\alpha)\lambda + 1+ 1/n}$, then $$ z_i = sgn(t_i)(|t_i| - \frac{\alpha \lambda}{(1-\alpha)\lambda + 1+ 1/n})_+ $$

Iteratively applying the proximal operator to $\beta_{init}$, you will get $\beta^*$, where $\beta^* = \text{prox}_{elastic}(\beta^*)$, which implies $\partial f (\beta^*)= 0 $

$\text{prox}_{\alpha\lambda\|\beta_i\|_1 + (1-\alpha)\lambda \| \beta_i\|_2^2}$ is even easier to computer, \begin{align*} \beta_i - ((1-\alpha)\lambda +1 )z_i \in \alpha \lambda \text{sgn}(z_i) \end{align*} Again, let $t_i = \frac{\beta_i}{(1-\alpha)\lambda +1}$ $z_i = \text{sgn}(t_i)(|t_i| -\frac{\alpha\lambda}{(1-\alpha)\lambda + 1})_+ $.

Please let me know if I made any numerical mistake. Good luck