I am trying to figure out how the intercept is calculated for logistic regression lasso using coordinate descent algorithm based on this seminal paper:
Friedman, J., Hastie, T. & Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software 33, 1-22 (2010).
In the case of linear regression, the authors clearly stated that the intercept is $\hat{\beta}_0=\bar{y}$, which is very obvious to see. Then, in the logistic regression case, they defined a "working response" $z_i$ which they simply substituted for the $y_i$ for the coordinate update in the linear regression case. However, unlike linear regression, it seems to me that $\beta_0$ also needs to be updated in each loop since this "working response" directly involves both the intercept and all coefficients. The author didn't seem to give a clear clarification on how to do this. So my questions is: how is the intercept updated for the logistic regression? Do I just simply replace $y_i$ with $z_i$ and use $\hat{\beta}_0=\bar{z}$? If so, what's the mathematical justification for it? I'm not familiar with Fortran and their core code for logistic regression was written without any comment/explanation and is just too much for me to wade through.
Thanks.
Best Answer
I think your suspicion is correct - the intercept term does need to be updated for all glmnets except for elastic net.
Here's a link to the glmnet fortran code. The function that fits logistic nets begins on line 2039. If you begin there, and search downwards for
a0
, which is the vector containing the intercept values for each $\lambda$:you will get to:
k
is the lambda index,ca
is the vector of all coefficients:And I believe
xm
is the sample weighted design matrix:This looks a lot like an update rule for the intercept term, which should be of the same form as the other coefficients.
I feel your pain in looking at the FORTRAN code. It scares the crap out of me.