[Math] KKT conditions for a convex optimization problem with a L1-penalty and box constraints

convex optimizationconvex-analysiskarush kuhn tuckernonlinear optimization

I am having some trouble deriving / understanding optimality conditions for a convex optimization problem of the form:

$$\begin{align}
\min_{x\in\mathbb{R}^d}~ & f(x) + C.\|x\|_1 &\\
&x_i \leq u & \text{ for } i = 1,\ldots,d \\
&x_i \geq l &\text{ for } i = 1,\ldots,d
\end{align}$$

where:

  • $f:\mathbb{R}^d \rightarrow \mathbb{R}$ is smooth and (strictly) convex
  • $C>0$ is a $L_1$ regularization penalty
  • $l<0$ is lower bound for each element of $x$
  • $u>0$ is upper bound for each element of $x$

To derive optimality conditions, I let $\lambda_i^+$ and $\lambda_i^-$ denote the dual variables for the upper and lower bound constraints. The Lagrangian is then:

$$ f(x) + C.\|x\|_1 + \sum_{i=1}^d \lambda_i^+(x_i – u) + \sum_{i=1}^d \lambda_i^-(l – x_i )$$

And I believe that the KKT conditions should be:

$$\begin{align}
\mathbf{0} & \in \nabla f(x) + C.\partial \|x\|_1 + \lambda^+ – \lambda^- & \text{(stationarity)} \\
x_i &\in [l,u] \quad i = 1,\ldots,d & \text{(primal feasibility)} \\
\lambda_i^+, \lambda_i^- &\geq 0 ~~~~~\quad i = 1,\ldots,d & \text{(dual feasibility)} \\
\lambda_i^+(x_i – u) & = 0 ~~~~~\quad i = 1,\ldots,d & \text{(comp. slackness #1)}\\
\lambda_i^-(l- x_i) & = 0 ~~~~~\quad i = 1,\ldots,d & \text{(comp. slackness #2)}
\end{align}$$

where $\lambda^+ = [\lambda_1^+,\ldots,\lambda_d^+]^T$, $\lambda^- = [\lambda_1^-,\ldots,\lambda_d^-]^T$ and

$$\frac{\partial}{\partial x_i} \|x\|_1 = \begin{cases} 1 \qquad ~~~\mbox{ if } x_i > 0\\ [-1,1] ~~~\mbox{ if } x_i = 0\\ -1 \qquad \mbox{ if } x_i< 0 \end{cases}$$

I am wondering:

  1. Can I use the subgradient of the $L_1$-norm to derive KKT conditions for this problem?
  2. Are the KKT conditions both necessary and sufficient for this problem?
  3. If $x_i = u$, then do the KKT conditions imply that $(\nabla f(x))_i + C + \lambda_i^+ = 0$?

Best Answer

There is an error with the Lagrangian: it should contain $\lambda_i^-(l-x_i)$ (not $\lambda_i^-(-l-x_i)$), which also effects the slackness condition, which should be $\lambda_i^-(l-x_i)=0$.

1) The KKT system you have written is nothing else than the subgradient optimality condition $$ 0\in \partial J(x), $$ where $$ J(x) = f(x) + c\|x\|_1 + I_{(-\infty,u]}(x) + I_{[l,+\infty)}(x), $$ where $I_{(-\infty,u]}$ and $I_{[l,+\infty)}$ are the indicator functions of the boxes $(-\infty,u]$ and $[l,+\infty)$. Now verify that the conditions for the subgradient sum rule are fulfilled, and write out the individual subgradients to obtain your system.

2) The KKT conditions of convex programs are always sufficient. They are necessary under constraint qualifications (here: validity of subgradient sum-rule, which holds for your case).

3) Yes, follows from the (corrected) complementarity slackness #2.

Related Question