Working out the derivative of the log-likelihood for group LASSO

I'm following the working of the sparse group LASSO in the paper 'A Sparse-Group LASSO' by Simon. For the linear case, we have the problem given as

$$\text{min}_\beta \frac{1}{2}||y-\sum_{l=1}^m X^{(l)}\beta^{(l)}||_2^2 + \text{pen}(\beta).$$

I've omitted the full details of the penalty term, as this question doesn't make use of it. In the paper, the problem is rewritten for logistic regression as

$$\text{min}_\beta \frac{1}{n}\left[ \sum_{i=1}^{n}\left(\log(1+\exp(x_i^T \beta))+y_ix_i^T \beta \right)\right] +
\text{pen}(\beta),$$
although this doesn't show the grouping structure, so my thought was that it should be written as (as is done in 'The group lasso for logistic regression' by Meier, 2008)

$$\text{min}_\beta \frac{1}{n}\left[ \sum_{i=1}^{n}\left(\log(1+\exp(\sum_{g=1}^{G}x_{i,g}^T \beta^{(g)}))+y_i \sum_{g=1}^{G} x_{i,g}^T \beta^{(g)} \right)\right] +
\text{pen}(\beta).$$

Now to my question: In the Simon paper, the details of the logistic implementation aren't given aside from the problem statement. For the linear case, the log-likelihood is rewritten as

$$l(r_{(-k)},\beta) = \frac{1}{2n}||r_{(-k)} – X^{(k)}\beta||_2^2,$$
where $r_{(-k)} = y-\sum_{l\neq k} X^{(l)}\beta^{(l)}$ is the partial residual of $y$, subtracting all group fits other than group $k$. Further, it goes on to show

$$\nabla l(r_{(-k)}, \beta_0) = -X^{(k)\intercal} r_{(-k)}/n.$$

The paper states that for logistic regression, we must find the unpenalised loss function $l(\beta)$ as a function of only $\beta^{(k)}$, with the rest of the coefficients $\beta^{(-k)}$ fixed, so that we define it as $l_k(\beta^{(-k)}, \beta^{(k)}).$

My question is: How do I find this log-likelihood and derivative in the logistic regression case?

Best Answer

I will write the log-likelihood (to be maximized) as $$ \phi(\mathbf{b}) =\sum_i y_i(c+ \mathbf{x}_{i,k}^T \mathbf{b}) - \log \left[ 1+\exp(c+\mathbf{x}_{i,k}^T \mathbf{b}) \right] $$ where $\mathbf{x}_{i,k}$ is the restriction of the predictors from group $k$ in the $i$-th example. Note that I have a minus sign here?

Simple derivation yields $$ \frac{\partial \phi}{\partial \mathbf{b}} = \sum_i \left( y_i - \sigma \left[ c+\mathbf{x}_{i,k}^T \mathbf{b} \right] \right) \mathbf{x}_{i,k} $$ using the logistic regression function to model $p(y=1|\mathbf{x})=\sigma(\mathbf{x}^T\beta)$

Best Answer

Related Solutions

[Math] Understanding the difference between Ridge and LASSO

Sparse group lasso derivation of soft thresholding operator via subgradient equations

Related Question