Solved – glmnet in R: Selecting the right $\alpha$

glmnet

I was reading the following link. One of the section mentions about selecting the a value for $\alpha$.

Looking at the bottom right plot below (contains MSE for 3 different values of $\alpha$), it seems to me that $\alpha = 0$ gives the lowest MSE and hence should be the best here. However, the image below says that $\alpha=1$ is the best instead.

Question: I do not understand why $\alpha=1$ is the best ? Because it seems to me that $\alpha=0$ yields lowest MSE for all values of $\log(\lambda)$.

And my interpretation is that the higher the value of $\lambda$ , the more penalization there is. And if a model yields MSE lower than other models at the same penalty factor (i.e. same $\lambda$), then surely this should be the best model. Am I wrong ?

What did I misunderstand here ?

Image 1 enter image description here

Best Answer

If low MSE is your goal, go with $\alpha=0$ and a small value of $\lambda$ (s = lambda.1se, s = lambda.min or even something smaller). If your goal is a simpler model (with fewer than 20 variables), and then you could tune $\lambda$ using the cross validation plots but also your preference for model complexity.

I'm guessing you have enough data relative to your model that regularization is not especially beneficial. In all plots above, the cross-validation results are telling the same story: "the smaller the lambda the better." If you extrapolate that curve out, then you don't have any regularization at all and you're back to ordinary regression with 20 variables. If I had to guess, the full 20 variable model really is the best for your situation, which is why $\alpha = 0$ with a very small $\lambda$ is giving you the best MSE results - it keeps all the variables and applies very little regularization (i.e., bias).

For reasons I don't fully understand, LASSO ($\alpha = 1$) stops short of 20 variables (even for s = lambda.min) though the curve appears to be decreasing. Perhaps the default is set so that variable selection actually happens, which is presumed by the user's choice of $\alpha=1$.