Optimizing logistic regression with a custom penalty using gradient descent

I'm trying to fit a logistic regression model on a certain dataset. I want to ensure the learned model is smooth, that is samples which belong to the same cluster/group according to a prior knowledge/graph get similar output. I'm also looking for a model with a sparse coefficients. Hence I have come up with the following loss function that takes into account the above requirements:

$$
\mathfrak{L(\mathbf{\beta })} = \underbrace{-\sum_{i=1}^{m}y_{i}\beta^{T}x_{i} + \log{(1 + e^{\beta^{T}x_{i}})}}_{logloss} \ + \ \lambda_{1}\left\|\beta \right\|_{1} + \lambda_{2}f^{T}Lf
$$

In above equation, the smoothness penalty is the $f^{T}Lf$ term, whereas $\lambda_{1}$ and $\lambda_{2}$ are regularization terms. $L$ is a Laplacian matrix of the graph formed from the samples and $ f = sigmoid(\mathbf{\beta}^{T}\mathbf{X})$. If the loss function was made up of only the log loss and the smoothness penalty, I can easily use gradient descent to optimize it since these functions are convex functions (because $L$ is positive definite matrix). But with the addition of the L1 penalty ($\left\|\beta \right\|_{1}$), the whole loss funciton can not be optimized using gradient optimization as $\left\|\beta \right\|_{1}$ is not a smooth function.

How can I use gradient descent in this case? or are there any alternative suggestions to optimize the above loss function?

Best Answer

I used the methodology in this paper where they improve upon GLMNET to solve L1 regularized logistic regression. Since $f^{T}Lf$ is twice differentiable, I added its gradient and hessian to the solution in equation 11 of the paper. Someone has already implemented the code for the paper here which I've built upon. You can find my changes here

Best Answer

Related Solutions

Solved – How to make stochastic gradient descent algorithm converge to the optimum

Solved – Can gradient descent find a better solution than least squares regression

Related Question