Solved – cv.glmnet Ridge Regression lambda.min = lambda.1se

cross-validationglmnetrridge regression

I'm currently running a ridge regression in R using the glmnet package, however, I recently ran into a new problem and was hoping for some help in interpreting my results. My data can be found here: https://www.dropbox.com/sh/hpxu3t0vqkrzfgf/AAB6F-yMYMfuI5E__gfDuW6sa?dl=0

My data consists of a 26531×428 observation matrix x and a 26531×1 response vector y. I am attempting to determine the optimal value of lambda.min, and when I run the code

> lambda=cv.glmnet(x=x,y=y,weights=weights,alpha=0,nfolds=10,standardize=FALSE)

I get

$lambda.min
[1] 2.123479
$lambda.1se
[1] 619.0054

which are results I would expect. However, I would like to add a slight tweak to this regression. I have prior knowledge of each of my 428 coefficients, and instead of shrinking each coefficient towards 0, as is the default with ridge regression, I would like to shrink each coefficient towards a specific value other than 0. After reaching out to Dr. Trevor Hastie, one of the creators of glmnet, he told me that this could be achieved by running the same code after substituting y with y2, where y2 = y - x%*%d and d is a 428×1 vector of coefficient priors. He said to then add d to my new coefficients, which would give me my prior-informed coefficients. After rerunning the code

> lambda=cv.glmnet(x=x,y=y2,weights=weights,alpha=0,nfolds=10,standardize=FALSE)

I unfortunately get

$lambda.min
[1] 220.3026
$lambda.1se
[1] 220.3026

The results of plot(lambda) look like this
lambda plot

Does anyone know why glmnet can't find a suitable lambda.min? Could it be because my vector of priors contains estimates that are too far off? Any help would be greatly appreciated!

Best Answer

Glmnet is returning a very large optimal regularization parameter, i.e., it is regularizing away all of your coefficients. It looks like glmnet is telling you that, after accounting for your prior (or offset) coefficients, what is left is noise. That is, you already offset the correct coefficients, and the model is just validating that.