I'm trying to create a reduced model to predict many dependent variables (DV) (~450) that are highly correlated.
My independent variables (IV) are also numerous (~2000) and highly correlated.
If I use the lasso to select a reduced model for each output individually, I am not guaranteed to get the same subset of independent variables as I loop over each dependent variable.
Is there a multivariate linear regression that uses the lasso in R?
This is not group lasso. group lasso groups the IV. I want multivariate linear regression (meaning the DV is a matrix, not a vector of scalars), that also implements lasso. (Note: as NRH points out, this is not true. Group lasso is a general term that includes strategies that group the IV, but also include strategies that group other parameters such as the DV)
I found this paper that gets into something called Sparse Overlapping Sets Lasso
Here is some code that does multivariate linear regression
> dim(target)
[1] 6060 441
> dim(dictionary)
[1] 6060 2030
> fit = lm(target~dictionary)
Here is some code that does lasso on a single DV
> fit = glmnet(dictionary, target[,1])
And this is what I would like to do:
> fit = glmnet(dictionary, target)
Error in weighted.mean.default(y, weights) :
'x' and 'w' must have the same length
Selecting features that fit ALL the targets at once
Best Answer
For multivariate responses (number of dependent variables larger than 1), you need
family = "mgaussian"
in the call ofglmnet
.The lsgl package is an alternative, which provides a more flexible penalty.
With a $k$-dimensional response, the glmnet package implements the penalty $$\sum_{j = 1}^p \| \boldsymbol{\beta}_j \|_2$$
where $\boldsymbol{\beta}_j = (\beta_{j1}, \ldots, \beta_{jk})^T$ is the vector of coefficients for the $j$th predictor. In the help page for
glmnet
you can read:This penalty is an example of a group lasso penalty, which groups parameters for the different responses that are associated to the same predictor. It results in the selection of the same predictors across all responses for a given value of the tuning parameter.
The lsgl package implements sparse group lasso penalties of the form $$\alpha \sum_{j=1}^p \sum_{l = 1}^k \xi_{jl} |\beta_{jl}| + (1-\alpha) \sum_{j = 1}^p \gamma_{j} \| \boldsymbol{\beta}_j \|_2$$
where $\xi_{jl}$ and $\gamma_{j}$ are certain weights chosen to balance the contributions from the different terms. The default is $\xi_{jl} = 1$ and $\gamma_{j} = \sqrt{k}$. The parameter $\alpha \in [0,1]$ is a tuning parameter. With $\alpha = 0$ (and $\gamma_j = 1$) the penalty is equivalent to the penalty used by
glmnet
withfamily = "mgaussian"
. With $\alpha = 1$ (and $\xi_{jl} = 1$) the penalty gives ordinary lasso. The lsgl implementation also allows for an additional grouping of the predictors.A note about group lasso. The term group lasso is often associated with a grouping of predictors. However, from a more general viewpoint, group lasso is simply a grouping of parameters in the penalty. The grouping used by
glmnet
withfamily = "mgaussian"
is a grouping of parameters across responses. The effect of such a grouping is to couple the estimation of the parameters across the responses, which turns out to be a good idea, if all the responses can be predicted from roughly the same set of predictors. The general idea of coupling multiple learning problems, that are expected to share some structure, is known as multi-task learning.