Solved – Elastic net produces complex output with too many non-zero coefficients

elastic netglmnetlassoregressionridge regression

I have run 3-fold cross-validation for elastic net using elasticnet R function on ~200 observations and using 80 variables (and there will be some more).

Both Lasso or Ridge tend to select over 40 variables with non-zero coefficients for the final model. I doubt I need more that 10-15, anything else I consider just overfitting. How can I force elastic net to simplify the output and not to include the redundant "tail"?

Best Answer

It's not immediately what exactly you're doing when fitting a model and what you goal is. I'll answer as best I can with the information provided.

GLMNET has two tuning parameters. A sequence of $\lambda$s is generated internally; the user supplies a value of $\alpha$.

The stated question is how to choose a GLMNET model that has 10-15 predictors. The number of nonzero predictors is tracked by the software. So for the supplied value of $\alpha$, just pick the solution corresponding to a $\lambda$ value that provides the desired number of predictors. On the assumption that the supplied value of $\alpha$ is "known," you're done. If you're uncertain about alpha (perhaps due to a desire to also account for collinearity), you'll have to tune over $\alpha$ and compare alternative models according to some appropriate out-of-sample metric in the usual way.

Also of interest may be my answer here. It's worth noting that this answer is highly controversial among several highly-ranked CV contributors, and I'm not certain about how to correctly approach the issue.