[Math] Minimizing $L_1$ Regularization

derivativesmachine learningmatricesvector analysis

I have given a high dimensional input $x \in \mathbb{R}^m$ where $m$ is a big number. Linear regression can be applied, but in generel it is expected, that a lot of these dimensions are actually irrelevant.

I ought to find a method to model the function $y = f(x)$ and at the same time uncover which dimensions contribute to the output. The overall hint is to apply the $L_1$-norm Lasso regularization.

$$L^{\text lasso}(\beta) = \sum_{i=1}^n (y_i – \phi(x_i)^T \beta)^2 + \lambda \sum_{j = 1}^k | \beta_j |$$

Minimizing $L^{\text lasso}$ is in general hard, for that reason I should apply gradient descent. My approach so far is the following:

In order to minimize the term, I chose to compute the gradient and set it $0$, i.e.

$$\frac{\partial}{\partial \beta} L^{\text lasso}(\beta) = -2 \sum_{i = 1}^n \phi(x_i)(y_i – \beta)$$

Since this cannot be computed simply, I want to apply the gradient descent here. The gradient descent takes a function $f(x)$ as input, so in place of that will be put the $L(\beta)$ function.

My problems are:

  • is the gradient I computed correct? I left out the regularizationt erm in the end, which I guess is a problem, but I also have problems to compute the gradient for this one

    In addition I may replace $| \beta_j |$ by the function $l(x)$, which is $l(x) = x – \varepsilon/2$ if $|x| \geq \epsilon$, else $x^2/(2 \varepsilon)$

  • one step in the gradient descent is: $\beta \leftarrow \beta – \alpha \cdot \frac{g}{|g|}$, where $g = \frac{\partial}{\partial \beta}f(x))^T$

    I don't get this one, when I transpose the gradient I have computed the term cannot be resolved due to dimension mismatch

Best Answer

The subgradient should be like this:

$$\frac{\partial}{\partial \beta} L^{\text lasso}(\beta) = - \sum_{i = 1}^n (2\phi(x_i)(y_i - \phi(x_i)^T \beta) + \lambda \;sign(\beta))$$

Related Question