Solved – Using LASSO for variable selection, then using Logit

lassologitmodel selection

I know this would muddy the statistical inference, but I am really only concerned with getting as close to an accurate model as I can.

I have a dichotomous outcome variable, with a large set of dichotomous predictors. I am thinking I would like to try using LASSO to select which variables I should include in my model, then input those selected variables in to a Logit regression.

Is there something I am overlooking when it comes to the practicality of this approach?

Best Answer

There is a package in R called glmnet that can fit a LASSO logistic model for you! This will be more straightforward than the approach you are considering. More precisely, glmnet is a hybrid between LASSO and Ridge regression but you may set a parameter $\alpha=1$ to do a pure LASSO model. Since you are interested in logistic regression you will set family="binomial".

You can read more here: http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html#intro

Related Question