Solved – Can the alpha, lambda values of a glmnet object output determine whether ridge or Lasso

caretcross-validationgeneralized linear modelregression

Given a glmnet object using train() where trControl method is "cv" and number of iterations is 5, I obtained that the bestTune alpha and lambda values are alpha=0.1 and lambda= 0.007688342. On running the glmnet object, I notice that the alpha values start from 0.1.
Can the inference here be that the method used is Lasso and not ridge because of the non-negative alpha value?

In general, can the values of alpha, lambda indicate which model is being used?

Best Answer

As far as I understand glmnet, $\alpha=0$ would actually be a ridge penalty, and $\alpha=1$ would be a Lasso penalty (rather than the other way around) and as far as glmnet is concerned you can fit those end cases.

The penalty with $\alpha=0.1$ would be fairly similar to the ridge penalty but it is not the ridge penalty; if it's not considering $\alpha$ below $0.1$ you can't necessarily infer much more than that just from the fact that you had that endpoint. If you know that an $\alpha$ value that was only slightly larger was worse then it would be likely that a larger range might have chosen a smaller $\alpha$, but it doesn't suggest it would have been $0$; I expect it would not. If the grid of values is coarse it may well have been that a larger value than $0.1$ would be better.

[You may want to check whether there was some other reason that $\alpha$ might have been at an endpoint; e.g. I seem to recall $\lambda$ got set to an endpoint in forecasting if coefficients for lambdaOpt were not saved.]

Related Solutions

Solved – Why do Lars and Glmnet give different solutions for the Lasso problem

Finally we were able to produce the same solution with both methods! First issue is that glmnet solves the lasso problem as stated in the question, but lars has a slightly different normalization in the objective function, it replaces $\frac{1}{2N}$by $\frac{1}{2}$. Second, both methods normalize the data differently, so the normalization must be swiched off when calling the methods.

To reproduce that, and see that the same solutions for the lasso problem can be computed using lars and glmnet, the following lines in the code above must be changed:

la <- lars(X,Y,intercept=TRUE, max.steps=1000, use.Gram=FALSE)

la <- lars(X,Y,intercept=TRUE, normalize=FALSE, max.steps=1000, use.Gram=FALSE)

and

glm2 <- glmnet(X,Y,family="gaussian",lambda=0.5*la$lambda,thresh=1e-16)

glm2 <- glmnet(X,Y,family="gaussian",lambda=1/nbSamples*la$lambda,standardize=FALSE,thresh=1e-16)

GLMNet – What Does the varImp Function in the Caret Package Compute for a GLMNet Object?

For these models, they are regression the coefficients for the final Model. Big coefficients are associated with larger effects. Using scale = FALSE is good here so you can also get the signs too.

There are always pitfalls with these measures depending on how you want to measure importance. They don't measure lack of fit at all, so if your model is 51% accurate, they are not very reflective of the data. In the case of regression coefficients, main effects are misleading when interactions are present and so on.

As for correlation between predictors, Friedman et al. (2010, JSS) state:

Ridge regression is known to shrink the coefficients of correlated predictors towards each other, allowing them to borrow strength from each other. In the extreme case of $k$ identical predictors, they each get identical coefficients with $1/k^{th}$ the size that any single one would get if fit alone.[...]

Lasso, on the other hand, is somewhat indifferent to very correlated predictors, and will tend to pick one and ignore the rest.

We have a pretty good example of that in Section 6.4 of APM

Max

Best Answer

Related Solutions

Solved – Why do Lars and Glmnet give different solutions for the Lasso problem

GLMNet – What Does the varImp Function in the Caret Package Compute for a GLMNet Object?

Related Question