Solved – glmnet LASSO regression only yields fitted coefficients equal 0


Here is the data set I'm working with:


I'm trying to find the best possible multiple regression for R as dependent and the rest as independent variables.

Here's what I did in R:

> trainX <- as.matrix(spxdata[4:11])
> trainY <- spxdata[[3]]
> CV = cv.glmnet(x = trainX, y = trainY, alpha = 1, nlambda = 100)
Warning message:
Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
> plot(CV)
> fit = glmnet(x = trainX, y = trainY, alpha = 1, lambda = CV$lambda.1se)
> fit$beta[,1]
    RE VOL260 VOL360     PE     PX   FCFY   GADY    NDE 
     0      0      0      0      0      0      0      0 

And here's the CV plot:

enter image description here

Why is there a warning message and why are all the fitted coefficients zero?

Best Answer

  1. Warning message is because you appear to have fewer than 30 observations. cv.glmnet defaults to 10 folds, which amounts to fewer than 3 observations per fold. The warning message doesn't appear consequential to your concern.
  2. The simplest explanation for why all fitted coefficients are zero is because the data does not support a more complex model (i.e. cross validation error is minimized at large shrinkage).

If you believe that some coefficients shouldn't be zero in the fitted model, you might consider:

  • A ridge regression, which is less likely to zero out coefficients (may still arbitrarily shrink them, however)

  • A Bayesian approach, where you set informative priors for coefficients you believe to be non-zero

Related Question