Optimization – Proximal Operator of Weighted L2 Norm

convex optimizationconvex-analysisoptimizationproximal-operators

Define the weighted $ {L}_{2} $ norm $ {\left\| x \right\|}_{2,w} = \sqrt{ \sum_{i = 1}^{n} {w}_{i} {x}_{i}^{2} }$. Find the formula for $ \operatorname{prox}_{\lambda
{\left\| \cdot \right\|}_{2,w} }(y)$
, where $ \lambda > 0 $.

By definition we have:
$$ \operatorname{prox}_{\lambda {\left\| \cdot \right\|}_{2,w}} \left( y \right) = \arg \min_{x} \left( \lambda \sqrt{ \sum_{i = 1}^{n} {w}_{i} {x}_{i}^{2} } + \frac{1}{2} {\left\| x – y \right\|}_{2}^{2} \right) $$

But I have no idea how to proceed from here. Any idea?

Best Answer

The previous answer contained a crucial mistake (thanks to the users in the comments for pointing it out) and became a mess of edits, so here's a new, correct one. Denote $\|x\|_{2,w}^2 = \sum_{i=1}^n w_ix_i^2$. Define $$f(x) = \lambda\sqrt{ \sum_{i = 1}^{n} {w}_{i} {x}_{i}^{2} } + \frac{1}{2} {\left\| x - y \right\|}_{2}^{2}.$$ This is a convex function, being the sum of a norm and a scaled version of the $\ell_2$ squared norm. It is not differentiable everywhere, but it is continuous - so we can essentially replace the gradient by the subgradient, which is defined as $$\partial f(x) = \{v\in\mathbb{R}^n \ | \ f(y) \geq f(x) + \langle v, y-x\rangle \text{ for all $y$}\}.$$ Then standard facts from convex analysis tell us that:

  • $x$ is a minimizer of $f$ if and only if $0\in \partial f(x)$;
  • if $f$ is differentiable at $x$, the subgradient $\partial f(x)$ is a singleton containing the gradient of $f$ at $x$;
  • given continuous convex functions $f,g$, we have $\partial (f+g)(x)$ is the convex hull of the Minkowski sum of $\partial f(x)$ and $\partial g(x)$.

Now we compute the subgradient of $f(x)$. When $x\neq0$, both summands are differentiable and we obtain the condition $$ \partial f(x) = \frac{\lambda}{\|x\|_{2,w}}Wx + (y-x).$$ where $W$ is a diagonal matrix with the $w_i$'s as entries.

The case $x=0$ is more interesting. The subgradient of the weighted $\ell_2$ norm at $0$ is by definition $$\{v \ | \ \langle v, y\rangle \leq \lambda \|y\|_{2,w}\text{ for all $y$}\}$$ which exactly means $\|v\|_{2,w}^*\leq \lambda$, where $\|\cdot\|_{2,w}^*$ is the dual norm to the weighted $\ell_2$ norm, which can easily be seen to be (use Cauchy-Schwarz) $\|y\|_{2,w}^* = \sqrt{y^T W^{-1}y}$. So the total subgradient at zero is $$\partial f(0) = \{ v - y \ | \ \|v\|_{2,w}^*\leq \lambda\}$$.

Now we must see when $0\in\partial f(x)$.

  • The condition $0\in\partial f(0)$ is that $\|y\|_{2,w}^* \leq \lambda$. So, when $\|y\|_{2,w}^* \leq \lambda$, $0$ is a global minimizer for $f$.
  • The condition $0\in\partial f(x)$ for $x\neq 0$ is that $$ y = \left(I + \frac{\lambda}{\|x\|_{2,w}}W\right) x$$ One can see by taking the squared $\|\cdot\|_{2,w}^*$ norm of the right hand side that it is $> \lambda$ when $x\neq 0$ and all the weights $w_i$ are positive. Thus, the two cases for a global minimizer are exclusive. Furthermore, one can from the above equation after some manipulations solve for the minimizer $x$ explicitly.
Related Question