If we know the Cholesky decomposition $V^{-1} = L^TL$, say, then
$$(y - X\beta)^T V^{-1} (y - X\beta) = (Ly - LX\beta)^T (Ly - LX\beta)$$
and we can use standard algorithms (with whatever penalization function one prefers) by replacing the response with the vector $Ly$ and the predictors with the matrix $LX$.
Finally we were able to produce the same solution with both methods! First issue is that glmnet solves the lasso problem as stated in the question, but lars has a slightly different normalization in the objective function, it replaces $\frac{1}{2N}$by $\frac{1}{2}$. Second, both methods normalize the data differently, so the normalization must be swiched off when calling the methods.
To reproduce that, and see that the same solutions for the lasso problem can be computed using lars and glmnet, the following lines in the code above must be changed:
la <- lars(X,Y,intercept=TRUE, max.steps=1000, use.Gram=FALSE)
to
la <- lars(X,Y,intercept=TRUE, normalize=FALSE, max.steps=1000, use.Gram=FALSE)
and
glm2 <- glmnet(X,Y,family="gaussian",lambda=0.5*la$lambda,thresh=1e-16)
to
glm2 <- glmnet(X,Y,family="gaussian",lambda=1/nbSamples*la$lambda,standardize=FALSE,thresh=1e-16)
Best Answer
Let's work with the lasso. Recall how a lasso regression model is fitted, given $\lambda$:
$$\min_{\beta\in\mathbb{R}^p}\left\{\frac{1}{N}\|y-X\beta\|_2^2-\lambda\|\beta\|_1\right\}$$
The first part of the summand gives the mean squared 2-norm of the residuals. The second part gives the 1-norm of the parameter vector (typically not including the intercept entry $\beta_0$).
There is no reason whatsoever these two components should be comparable in magnitude. Your model could fit very well, yielding small residuals, but need large parameters. Or the other way around. Plus, you may or may not first standardize your predictors, which will change the parameter estimates.
This applies to the estimate for $\beta$, given $\lambda$. Now, if you optimize $\lambda$, perhaps using cross-validation, this means that a priori you cannot say anything about the likely range of $\lambda$, other than $\lambda\geq 0$.
TL;DR: you appear to have misremembered. The optimum $\lambda$ in no way needs to be in some specific interval. Therefore, getting a "surprising" value does not tell you anything about the appropriateness (or not) of your lasso model.
The same of course applies to ridge regression or the elastic net.