Solved – Elastic net regularization: mean square error monotonically increases with lambda

elastic netglmnetlassoregularization

This is quite coincidental as my question is nearly identical to this one asked shortly before, but I am also using elastic net regularization with R's glmnet library as a method of variable selection (but in my case it is for a Gaussian and not binomial family).

I have an instance where there does not appear to be any value of lambda which reduces the deviance of the fit. In the left figure below, the $\lambda$ values are selected automatically by cv.glmnet(); in the right figure below I have specified the values to be $\exp(\{-11\ldots-1\})$ (using 250 points evenly spaced between -11 and -1), but in either case it appears there the deviance or mean square error (MSE) is monotonically increasing with $\lambda$.

Does this say something useful about my system, and is there any value of $\lambda$ for which a meaningful interpretation of the selected variables can be extracted?

I would additionally appreciate reference to peer-reviewed literature or conference proceedings which discuss similar cases – I have come up empty-handed in my search.

Many thanks, community.

enter image description here

(For non-R users, the left dotted vertical line in each figure indicates the minimum MSE value – which in my problem corresponds to the minimum value of $\lambda$ selected in both cases – and the right dotted vertical line corresponds to the $\lambda$ of the minimum MSE + 1 standard error solution.)

Best Answer

The interpretation is exactly the same as I discussed in the Q & A you link to. All your right-hand plot has done is extend the penalty ($\log(\lambda)$) into a region of exponentially decreasing values, i.e. exponentially decreasing amounts of shrinkage.

The CV deviance of the model is flat over a range of values for $\log(\lambda)$ yet there is shrinkage being applied; sufficient shrinkage to remove some variables from the model completely. I'm not sure you can trust the values on the upper axis for your automatic plot - do you really have 1636 covariates in the model? - but for the left-hand plot the simplest/smaller model within 1 SE of the best model has somewhere between 100 and 87 covariates in it.

In your situation, the optimal shrinkage is the full model; but a model that does just as well as this is the 1 SE model. This model has some shrinkage applied, but it is not a large amount; there will still be many predictors with non-zero coefficients.