I'm currently running a ridge regression in R using the glmnet
package, however, I recently ran into a new problem and was hoping for some help in interpreting my results. My data can be found here: https://www.dropbox.com/sh/hpxu3t0vqkrzfgf/AAB6F-yMYMfuI5E__gfDuW6sa?dl=0
My data consists of a 26531×428 observation matrix x
and a 26531×1 response vector y
. I am attempting to determine the optimal value of lambda.min
, and when I run the code
> lambda=cv.glmnet(x=x,y=y,weights=weights,alpha=0,nfolds=10,standardize=FALSE)
I get
$lambda.min
[1] 2.123479
$lambda.1se
[1] 619.0054
which are results I would expect. However, I would like to add a slight tweak to this regression. I have prior knowledge of each of my 428 coefficients, and instead of shrinking each coefficient towards 0, as is the default with ridge regression, I would like to shrink each coefficient towards a specific value other than 0. After reaching out to Dr. Trevor Hastie, one of the creators of glmnet
, he told me that this could be achieved by running the same code after substituting y
with y2
, where y2 = y - x%*%d
and d
is a 428×1 vector of coefficient priors. He said to then add d
to my new coefficients, which would give me my prior-informed coefficients. After rerunning the code
> lambda=cv.glmnet(x=x,y=y2,weights=weights,alpha=0,nfolds=10,standardize=FALSE)
I unfortunately get
$lambda.min
[1] 220.3026
$lambda.1se
[1] 220.3026
The results of plot(lambda)
look like this
Does anyone know why glmnet
can't find a suitable lambda.min
? Could it be because my vector of priors contains estimates that are too far off? Any help would be greatly appreciated!
Best Answer
Glmnet is returning a very large optimal regularization parameter, i.e., it is regularizing away all of your coefficients. It looks like glmnet is telling you that, after accounting for your prior (or offset) coefficients, what is left is noise. That is, you already offset the correct coefficients, and the model is just validating that.