Solved – CV for LASSO tuning parameter using LARS

cross-validationlarslasso

If I use the LARS algorithm to fit the LASSO path, is it sufficient to cross-validate using the values of $\lambda$ at each step in LARS or is it better to use a finer grid of $\lambda$ values? I guess I can ask this in two parts:

  • Is the prediction optimal model found at one of the LARS steps?

  • Is the correct subset of variables found at one of the LARS steps?

I would have done the cross-validation over the parameter $s = \left\Vert \beta \right\Vert _{1}\left/ \max \left\Vert \beta \right\Vert
_{1}\right.$ on a fine grid with $s\in \left( 0,1\right) $… but then I thought that defeats the purpose of using LARS, which is supposed to be attractive for its lower computational expense. Some clarification would be greatly appreciated.

Best Answer

I have a better understanding of the LASSO and LARS algorithm now so I thought I'd share what I've learned for the benefit of others. I am aware that the pathway between the LARS steps is linear so that, as Donbeo says, linear interpolation can be used to find all LASSO solutions. The point of my question was to ascertain whether this is even necessary. Apparently it is not.

  1. For variable selection - Yes, the best subset occurs at a LARS step. Reasoning:

    • The LAR steps are transition points at which new variables join the active set (as @Donbeo points out), the subset of variables is unchanged between steps. Therefore, if the correct subset is somewhere in the path, then it must occur at a transition point.
    • The LASSO path is shown to contain the true model with high probability (Meinshausen and Buhlmann (2006)).
  2. For prediction - Yes, the optimal $\lambda$ occurs at a LARS step. I've just worked through this paper by Zou, et al (2007), which has answered my question (at least when $X$ has full rank):

    • They show that an unbiased and consistent estimate of the degrees of freedom of the LASSO is the number of variables in the model (or the size of the active set). Hence, all models on the same interval between two LARS steps have the same degrees of freedom.
    • Furthermore, the squared error loss (which we minimize for prediction) is continuous and strictly increasing on each interval. Therefore, the optimal value of $\lambda$ does occur at one of the LARS steps.