Solved – Multivariate linear regression with lasso in r

lassomultivariate regressionr

I'm trying to create a reduced model to predict many dependent variables (DV) (~450) that are highly correlated.

My independent variables (IV) are also numerous (~2000) and highly correlated.

If I use the lasso to select a reduced model for each output individually, I am not guaranteed to get the same subset of independent variables as I loop over each dependent variable.

Is there a multivariate linear regression that uses the lasso in R?

This is not group lasso. group lasso groups the IV. I want multivariate linear regression (meaning the DV is a matrix, not a vector of scalars), that also implements lasso. (Note: as NRH points out, this is not true. Group lasso is a general term that includes strategies that group the IV, but also include strategies that group other parameters such as the DV)

I found this paper that gets into something called Sparse Overlapping Sets Lasso

Here is some code that does multivariate linear regression

> dim(target)
[1] 6060  441
> dim(dictionary)
[1] 6060 2030
> fit = lm(target~dictionary)

Here is some code that does lasso on a single DV

> fit = glmnet(dictionary, target[,1])

And this is what I would like to do:

> fit = glmnet(dictionary, target)
Error in weighted.mean.default(y, weights) : 
  'x' and 'w' must have the same length

Selecting features that fit ALL the targets at once

Best Answer

For multivariate responses (number of dependent variables larger than 1), you need family = "mgaussian" in the call of glmnet.

The lsgl package is an alternative, which provides a more flexible penalty.

With a $k$-dimensional response, the glmnet package implements the penalty $$\sum_{j = 1}^p \| \boldsymbol{\beta}_j \|_2$$
where $\boldsymbol{\beta}_j = (\beta_{j1}, \ldots, \beta_{jk})^T$ is the vector of coefficients for the $j$th predictor. In the help page for glmnet you can read:

The former [family = "mgaussian"] allows a multi-response gaussian model to be fit, using a "group -lasso" penalty on the coefficients for each variable. Tying the responses together like this is called "multi-task" learning in some domains.

This penalty is an example of a group lasso penalty, which groups parameters for the different responses that are associated to the same predictor. It results in the selection of the same predictors across all responses for a given value of the tuning parameter.

The lsgl package implements sparse group lasso penalties of the form $$\alpha \sum_{j=1}^p \sum_{l = 1}^k \xi_{jl} |\beta_{jl}| + (1-\alpha) \sum_{j = 1}^p \gamma_{j} \| \boldsymbol{\beta}_j \|_2$$
where $\xi_{jl}$ and $\gamma_{j}$ are certain weights chosen to balance the contributions from the different terms. The default is $\xi_{jl} = 1$ and $\gamma_{j} = \sqrt{k}$. The parameter $\alpha \in [0,1]$ is a tuning parameter. With $\alpha = 0$ (and $\gamma_j = 1$) the penalty is equivalent to the penalty used by glmnet with family = "mgaussian". With $\alpha = 1$ (and $\xi_{jl} = 1$) the penalty gives ordinary lasso. The lsgl implementation also allows for an additional grouping of the predictors.

A note about group lasso. The term group lasso is often associated with a grouping of predictors. However, from a more general viewpoint, group lasso is simply a grouping of parameters in the penalty. The grouping used by glmnet with family = "mgaussian" is a grouping of parameters across responses. The effect of such a grouping is to couple the estimation of the parameters across the responses, which turns out to be a good idea, if all the responses can be predicted from roughly the same set of predictors. The general idea of coupling multiple learning problems, that are expected to share some structure, is known as multi-task learning.

Related Question