Solved – How to interpret glmnet

glmnetrregressionregularization

I am trying to fit a multivariate linear regression model with approximately 60 predictor variables and 30 observations, so I am using the glmnet package for regularized regression because p>n.

I have been going through documentation and other questions but I still can't interpret the results, here's a sample code (with 20 predictors and 10 observations to simplify):

I create a matrix x with num rows = num observations and num cols = num predictors and a vector y which represents the response variable

> x=matrix(rnorm(10*20),10,20)
> y=rnorm(10)

I fit a glmnet model leaving alpha as default (= 1 for lasso penalty)

> fit1=glmnet(x,y)
> print(fit1)

I understand I get different predictions with decreasing values of lambda (i.e. penalty)

Call:  glmnet(x = x, y = y) 

        Df    %Dev   Lambda
  [1,]  0 0.00000 0.890700
  [2,]  1 0.06159 0.850200
  [3,]  1 0.11770 0.811500
  [4,]  1 0.16880 0.774600
   .
   .
   .
  [96,] 10 0.99740 0.010730
  [97,] 10 0.99760 0.010240
  [98,] 10 0.99780 0.009775
  [99,] 10 0.99800 0.009331
 [100,] 10 0.99820 0.008907

Now I predict my Beta values choosing, for example, the smallest lambda value given from glmnet

> predict(fit1,type="coef", s = 0.008907)

21 x 1 sparse Matrix of class "dgCMatrix"
                  1
(Intercept) -0.08872364
V1           0.23734885
V2          -0.35472137
V3          -0.08088463
V4           .         
V5           .         
V6           .         
V7           0.31127123
V8           .         
V9           .         
V10          .         
V11          0.10636867
V12          .         
V13         -0.20328200
V14         -0.77717745
V15          .         
V16         -0.25924281
V17          .         
V18          .         
V19         -0.57989929
V20         -0.22522859

If instead I choose lambda with

cv <- cv.glmnet(x,y)
model=glmnet(x,y,lambda=cv$lambda.min)

All of the variables would be (.).

Doubts and questions:

  1. I am not sure about how to choose lambda.
  2. Should I use the non (.) variables to fit another model? In my case I would like to keep as much variables as possible.
  3. How do I know the p-value, i.e. which variables significantly predict the response?

I apologize for my poor statistical knowledge! And thank you for any help.

Best Answer

Here's an unintuitive fact - you're not actually supposed to give glmnet a single value of lambda. From the documentation here:

Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.

cv.glmnet will help you choose lambda, as you alluded to in your examples. The authors of the glmnet package suggest cv$lambda.1se instead of cv$lambda.min, but in practice I've had success with the latter.

After running cv.glmnet, you don't have to rerun glmnet! Every lambda in the grid (cv$lambda) has already been run. This technique is called "Warm Start" and you can read more about it here. Paraphrasing from the introduction, the Warm Start technique reduces running time of iterative methods by using the solution of a different optimization problem (e.g., glmnet with a larger lambda) as the starting value for a later optimization problem (e.g., glmnet with a smaller lambda).

To extract the desired run from cv.glmnet.fit, try this:

small.lambda.index <- which(cv$lambda == cv$lambda.min)
small.lambda.betas <- cv$glmnet.fit$beta[, small.lambda.index]

Revision (1/28/2017)

No need to hack to the glmnet object like I did above; take @alex23lemm's advice below and pass the s = "lambda.min", s = "lambda.1se" or some other number (e.g., s = .007) to both coef and predict. Note that your coefficients and predictions depend on this value which is set by cross validation. Use a seed for reproducibility! And don't forget that if you don't supply an "s" in coef and predict, you'll be using the default of s = "lambda.1se". I have warmed up to that default after seeing it work better in a small data situation. s = "lambda.1se" also tends to provide more regularization, so if you're working with alpha > 0, it will also tend towards a more parsimonious model. You can also choose a numerical value of s with the help of plot.glmnet to get to somewhere in between (just don't forget to exponentiate the values from the x axis!).