GLMNet – Why Is cv.glmnet Giving a Lambda.min That Is Clearly Not the Lambda for Minimum Error?

glmnetoverfitting

I have X possible predictors for response Y. In my case X >> Y.

I have noticed in my runs of cv.glmnet (leave-on-out and all other params default) that if I try to predict using lambda.min that it simply returns the mean value of Y. If I run the prediction with choices of lambda < lambda.min, it gives actual predictions – which have a lower error than using the mean value of Y.

I'm not sure what's going on here. It's as if the code is defaulting to a dummy predictor (the mean response) for some reason. It appears that this behavior is a function of the size of X.

Here's a simple example:

x=replicate(100,rnorm(10))

y=replicate(1,rnorm(10))

cvfit=cv.glmnet(x,y,nfolds=10)

ypred1=predict(cvfit,newx=x,s="lambda.min")

(in a case I just ran, this gives a cvfit$lambda.min = 0.8453387 and all entries in
ypred1 are the mean value of y. So, let's choose a different lambda)

ypred2=predict(cvfit,newx=x,s=0.1)

mse1=mean((ypred1-y)^2) = 1.20

mse2=mean((ypred2-y)^2) = 0.03

I understand that "newx=x" doesn't make sense for any real work, but I don't understand why it returns the predictions it does.

Best Answer

Here, glmnet is working as intended! In your example, there is no relationship between $x$ and $y$ (both were independently generated). So the ``correct'' thing to do is to just always predict $\hat{y} = \bar{y}.$ Any method that isn't doing that is overfitting the test set.