Solved – How to interpret this glmnet() code and its output in R

glmnetr

I am not quite sure how to interpret the output of this code:

 coef(ridge_model, s = cv.glmnet(model, y, k=k)$lambda.min)

ridge_model is the output of glmnet()

What role does the argument 's' play?

Output:

7 x 1 sparse Matrix of class "dgCMatrix"
                    1
(Intercept) 86.825637
(Intercept)  .       
x1           3.924821
x2           9.816783
x3          11.770995
x4B         22.385858
x4C         -6.438195

My confusion is in understanding how the coef() function works. ridge_model is
the output of glmnet() so it represents the fitted
model for different lambda values. Each lambda would have its set of
coefficients.
Then there is the cv.glmnet() that gives the k-fold cross validation
output and gives the minimum lambda value. We are giving this lambda
as an input to the 's' argument.
How would this then affect the ridge model which already has its
lambda values?
coef(ridge_model, s = cv.glmnet(model, y, k=k)$lambda.min)

Best Answer

This smells incorrect, you probably wanted:

fit <- cv.glmnet(model, y, k=k)
coef(fit, "lambda.min")

which will return the coefficients using the internal fit from the cross validation.

Unless ridge_model has the same predictors, weights, mixing parameter, etc, plugging in a penalty parameter from one model into another seems odd; but if that were the same, ridge_model would be the same as fit$glmnet.fit above and redundant.

Related Solutions

Solved – How to interpret the coefficients returned by cv.glmnet? Are they feature-importance

First of all, any variable with a coefficient of zero has been dropped from the model, so you can say it was unimportant.

Second of all, you can't really make inferences about the importance of coefficients, unless you scaled them all prior to the regression, such that they all had the same mean and standard deviation (and even then you have to be careful!). If your variables are un-scaled, variables with larger averages will tend to have larger absolute coefficients.

Another option would be to bootstrap sample your data, fit a model to each sample, and calculate confidence intervals around your coefficients.

Finally, how are you choosing the "alpha" parameter for your model?

Solved – How to interpret glmnet

Here's an unintuitive fact - you're not actually supposed to give glmnet a single value of lambda. From the documentation here:

Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to ﬁt a whole path than compute a single ﬁt.

cv.glmnet will help you choose lambda, as you alluded to in your examples. The authors of the glmnet package suggest cv$lambda.1se instead of cv$lambda.min, but in practice I've had success with the latter.

After running cv.glmnet, you don't have to rerun glmnet! Every lambda in the grid (cv$lambda) has already been run. This technique is called "Warm Start" and you can read more about it here. Paraphrasing from the introduction, the Warm Start technique reduces running time of iterative methods by using the solution of a different optimization problem (e.g., glmnet with a larger lambda) as the starting value for a later optimization problem (e.g., glmnet with a smaller lambda).

To extract the desired run from cv.glmnet.fit, try this:

small.lambda.index <- which(cv$lambda == cv$lambda.min)
small.lambda.betas <- cv$glmnet.fit$beta[, small.lambda.index]

Revision (1/28/2017)

No need to hack to the glmnet object like I did above; take @alex23lemm's advice below and pass the s = "lambda.min", s = "lambda.1se" or some other number (e.g., s = .007) to both coef and predict. Note that your coefficients and predictions depend on this value which is set by cross validation. Use a seed for reproducibility! And don't forget that if you don't supply an "s" in coef and predict, you'll be using the default of s = "lambda.1se". I have warmed up to that default after seeing it work better in a small data situation. s = "lambda.1se" also tends to provide more regularization, so if you're working with alpha > 0, it will also tend towards a more parsimonious model. You can also choose a numerical value of s with the help of plot.glmnet to get to somewhere in between (just don't forget to exponentiate the values from the x axis!).

Best Answer

Related Solutions

Solved – How to interpret the coefficients returned by cv.glmnet? Are they feature-importance

Solved – How to interpret glmnet

Related Question