In the documentation to R's glmnet package it states that when fitting an elastic net, the glmnet function will use a series of $\lambda$ values starting at the smallest $\lambda$ for which all coefficients are zero. How can I find such a value of $\lambda$?
Solved – How to find the smallest $\lambda$ such that all Lasso / Elastic Net coefficients are zero
elastic netglmnetlassomachine learningregression
Related Solutions
This is not a solution but some reflections on the possibilities and difficulties that I know of.
Whenever it is possible to specify a time-series model as $$Y_{t+1} = \mathbf{x}_t \beta + \epsilon_{t+1}$$ with $\mathbf{x}_t$ computable from covariates and time-lagged observations, it is also possible to compute the least-squares elastic net penalized estimator of $\beta$ using glmnet in R. It requires that you write code to compute $\mathbf{x}_t$ to form the model matrix that is to be specified in glmnet. This works for AR-models but not directly for ARMA-models, say. Moreover, the cross-validation procedures of glmnet are not sensible per se for time-series data.
For more general models
$$Y_{t+1} = f(\mathbf{x}_t, \beta) + \epsilon_{t+1}$$
an implementation of an algorithm for computing the non-linear least-squares elastic net penalized estimator of $\beta$ is needed. To the best of my knowledge there is no such implementation in R. I am currently writing an implementation to solve the case where
$$Y_{t+1} = \mathbf{x}_t g(\beta) + \epsilon_{t+1}$$
the point being that it is paramount for model selection that the lasso penalization is on $\beta$ and not $g(\beta)$. If I recall the ARIMA-parametrization correctly it also takes this form $-$ but I cannot offer any code at the moment. It is (will be) based on A coordinate gradient descent method for nonsmooth separable minimization.
Another issue is the selection of the amount of penalization (the tuning parameters). It will generally require a form of cross-validation for time-series, but I hope to be able to work out some less computationally demanding methods for specific models.
It's very likely you have some categorical variables in your input dataset that are being converted to dummy variables. For example, here is a model with four x variables: "Sepal.Width", "Petal.Length", "Petal,Width", and "Species", and one y variable: "Sepal.Length"
library(glmnet)
data(iris)
x <- model.matrix(Sepal.Length~., iris)[,-1]
y <- iris$Sepal.Length
m <- cv.glmnet(x, y)
Surprisingly, we get 6 coefficients, instead of the expected 5:
>coef(m, m$lambda.min)
6 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 2.1670759
Sepal.Width 0.5032347
Petal.Length 0.8137398
Petal.Width -0.3127065
Speciesversicolor -0.6763395
Speciesvirginica -0.9595409
However, after some investigation we realize the number of coefficients matches the number of columns in our input matrix:
> nrow(coef(m, m$lambda.min)) == ncol(x) + 1
[1] TRUE
Furthermore, this is the same number of coefficients we get from a glm model:
> t(t(coef(glm(Sepal.Length~., data=iris))))
[,1]
(Intercept) 2.1712663
Sepal.Width 0.4958889
Petal.Length 0.8292439
Petal.Width -0.3151552
Speciesversicolor -0.7235620
Speciesvirginica -1.0234978
/edit: Here is and example with binary data. Note that the glmnet model produces the same number of coefficients as a glm model, and both of them produce the expected number of coefficients. Check your code with a glm. Also note that both of these examples were reproducible: if there WAS a bug present in glmnet
(or worse yet glm
) my example code would provide the package authors (or the core R team) the first step in identifying and fixing the bug.
#Setup a glmnet problem
set.seed(42)
ncol <- 10
nrow <- 1000
x <- matrix(sample(c(TRUE, FALSE), ncol*nrow, replace=TRUE), ncol=ncol)
cf <- runif(ncol(x)) * sample(0:1, ncol(x), replace=TRUE)
y <- rowSums(x %*% cf) + runif(nrow(x))/10
y <- matrix(as.integer(y>=mean(y)), ncol=1)
#Fit the model
m <- cv.glmnet(x=x,y=y,family="binomial",type.measure="auc")
coef(m, m$lambda.min)
Check the number of coefficients
> nrow(coef(m, m$lambda.min)) == ncol(x) + 1
[1] TRUE
Check the number of coefficients from a glm
> nrow(coef(m, m$lambda.min)) == length(coef(glm(y~., data=data.frame(y,x), family='binomial')))
[1] TRUE
Best Answer
A lasso solution $\widehat{\beta}(\lambda)$ solves $$\min_\beta \frac{1}{2}||y-X\beta||_2^2 +\lambda||\beta||_1.$$ and it is well known that we have $\widehat{\beta}(\lambda)=0$ for all $\lambda \geq \lambda_1$ where $\lambda_1 = \max_j |X_j^Ty|$, which should give you the desired value.
Note that $\lambda_1$ may need a different scaling if the objective function is scaled differently.
Using the cars example with GLMNET:
fit<-glmnet(as.matrix(mtcars[,-1]),mtcars[,1], intercept=FALSE, standardize=FALSE) 1/32*max(abs(t(as.matrix(mtcars[,-1]))%*%mtcars[,1]))/(head(fit$lambda))[1]
This gives the value 1, as expected.
Note that standardize as well as intercept is set to FALSE. If standardize and intercept is set to TRUE, then the value of $\lambda$ is calculated on the scaled regressors. (In this regards, take a look at https://think-lab.github.io/d/205/#5 for how to perform a proper scaling to get the results you want.):
xy<-scale(mtcars) fit<-glmnet(as.matrix(mtcars[,-1]),mtcars[,1]) (1/32*max(abs(t(xy[,-1])%*%mtcars[,1]*sqrt(32/31))))/(head(fit$lambda))[1]
This once again gives the value 1...
However I am not sure what glmnet is calculating if intercept = TRUE but standardize = FALSE.
We saw that glmnet with its standard options calculates $\lambda_{1}$ as $$\lambda_{1} = \max_j| \frac{1}{n} \sum_{i=1}^n x_j^*y|$$, where $x_j^* = \frac{x_j-\overline{x_j}}{\sqrt{\frac{1}{n}\sum_{i=1}^n (x_j-\overline{x_j})^2}}.$
It turns out that for an elastic net problem (corresponding to $\alpha \in (0,1]$ in glmnet) its maximum value $\lambda_{1,\alpha}$ is calculated as
$$\lambda_{1,\alpha}= \lambda_{1}/\alpha$$.
Indeed, setting for example $\alpha=0.3$ we have:
aa<-0.3 xy<-scale(mtcars) fit<-glmnet(as.matrix(mtcars[,-1]),mtcars[,1],a=aa) 1/aa*(1/32*max(abs(t(xy[,-1])%*%mtcars[,1]*sqrt(32/31))))/(head(fit$lambda))[1]
which results once again in an output value of $1$.
That's for the calculations. Note however that the elastic net criterion can be rewritten as a standard lasso problem.