Using glmnet
is really easy once you get the grasp of it thanks to its excellent vignette in http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html (you can also check the CRAN package page).
As for the best lambda for glmnet
, the rule of thumb is to use
cvfit <- glmnet::cv.glmnet(x, y)
coef(cvfit, s = "lambda.1se")
instead of lambda.min
.
To do the same for lars
you have to do it by hand. Here is my solution
cv <- lars::cv.lars(x, y, plot.it = FALSE, mode = "step")
idx <- which.max(cv$cv - cv$cv.error <= min(cv$cv))
coef(lars::lars(x, y))[idx,]
Bear in mind that this is not exactly the same, because this is stopping at a lasso knot (when a variable enters) instead of at any point.
Please note that glmnet
is the preferred package now, it is actively maintained, more so than lars
, and that there have been questions about glmnet
vs lars
answered before (algorithms used differ).
As for your question of using lasso to choose variables and then fit OLS, it is an ongoing debate. Google for OLS post Lasso and there are some papers discussing the topic. Even the authors of Elements of Statistical Learning admit it is possible.
Edit: Here is the code to reproduce more accurately what glmnet
does in lars
cv <- lars::cv.lars(x, y, plot.it = FALSE)
ideal_l1_ratio <- cv$index[which.max(cv$cv - cv$cv.error <= min(cv$cv))]
obj <- lars::lars(x, y)
scaled_coefs <- scale(obj$beta, FALSE, 1 / obj$normx)
l1 <- apply(X = scaled_coefs, MARGIN = 1, FUN = function(x) sum(abs(x)))
coef(obj)[which.max(l1 / tail(l1, 1) > ideal_l1_ratio),]
Best Answer
There is a package in R called glmnet that can fit a LASSO logistic model for you! This will be more straightforward than the approach you are considering. More precisely, glmnet is a hybrid between LASSO and Ridge regression but you may set a parameter $\alpha=1$ to do a pure LASSO model. Since you are interested in logistic regression you will set family="binomial".
You can read more here: http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html#intro