Solved – CV for LASSO tuning parameter using LARS

cross-validationlarslasso

If I use the LARS algorithm to fit the LASSO path, is it sufficient to cross-validate using the values of $\lambda$ at each step in LARS or is it better to use a finer grid of $\lambda$ values? I guess I can ask this in two parts:

Is the prediction optimal model found at one of the LARS steps?
Is the correct subset of variables found at one of the LARS steps?

I would have done the cross-validation over the parameter $s = \left\Vert \beta \right\Vert _{1}\left/ \max \left\Vert \beta \right\Vert
_{1}\right.$ on a fine grid with $s\in \left( 0,1\right) $… but then I thought that defeats the purpose of using LARS, which is supposed to be attractive for its lower computational expense. Some clarification would be greatly appreciated.

Best Answer

I have a better understanding of the LASSO and LARS algorithm now so I thought I'd share what I've learned for the benefit of others. I am aware that the pathway between the LARS steps is linear so that, as Donbeo says, linear interpolation can be used to find all LASSO solutions. The point of my question was to ascertain whether this is even necessary. Apparently it is not.

For variable selection - Yes, the best subset occurs at a LARS step. Reasoning:
- The LAR steps are transition points at which new variables join the active set (as @Donbeo points out), the subset of variables is unchanged between steps. Therefore, if the correct subset is somewhere in the path, then it must occur at a transition point.
- The LASSO path is shown to contain the true model with high probability (Meinshausen and Buhlmann (2006)).
For prediction - Yes, the optimal $\lambda$ occurs at a LARS step. I've just worked through this paper by Zou, et al (2007), which has answered my question (at least when $X$ has full rank):
- They show that an unbiased and consistent estimate of the degrees of freedom of the LASSO is the number of variables in the model (or the size of the active set). Hence, all models on the same interval between two LARS steps have the same degrees of freedom.
- Furthermore, the squared error loss (which we minimize for prediction) is continuous and strictly increasing on each interval. Therefore, the optimal value of $\lambda$ does occur at one of the LARS steps.

Related Solutions

Solved – Why do Lars and Glmnet give different solutions for the Lasso problem

Finally we were able to produce the same solution with both methods! First issue is that glmnet solves the lasso problem as stated in the question, but lars has a slightly different normalization in the objective function, it replaces $\frac{1}{2N}$by $\frac{1}{2}$. Second, both methods normalize the data differently, so the normalization must be swiched off when calling the methods.

To reproduce that, and see that the same solutions for the lasso problem can be computed using lars and glmnet, the following lines in the code above must be changed:

la <- lars(X,Y,intercept=TRUE, max.steps=1000, use.Gram=FALSE)

la <- lars(X,Y,intercept=TRUE, normalize=FALSE, max.steps=1000, use.Gram=FALSE)

and

glm2 <- glmnet(X,Y,family="gaussian",lambda=0.5*la$lambda,thresh=1e-16)

glm2 <- glmnet(X,Y,family="gaussian",lambda=1/nbSamples*la$lambda,standardize=FALSE,thresh=1e-16)

Solved – LARS – LASSO with weights

The glmnet Package solves the lasso problem using coordinate descent. It also provides features for adding in weights

Best Answer

Related Solutions

Solved – Why do Lars and Glmnet give different solutions for the Lasso problem

Solved – LARS – LASSO with weights

Related Question