Solved – Multivariate linear regression with lasso in r

lassomultivariate regressionr

I'm trying to create a reduced model to predict many dependent variables (DV) (~450) that are highly correlated.

My independent variables (IV) are also numerous (~2000) and highly correlated.

If I use the lasso to select a reduced model for each output individually, I am not guaranteed to get the same subset of independent variables as I loop over each dependent variable.

Is there a multivariate linear regression that uses the lasso in R?

This is not group lasso. group lasso groups the IV. I want multivariate linear regression (meaning the DV is a matrix, not a vector of scalars), that also implements lasso. (Note: as NRH points out, this is not true. Group lasso is a general term that includes strategies that group the IV, but also include strategies that group other parameters such as the DV)

I found this paper that gets into something called Sparse Overlapping Sets Lasso

Here is some code that does multivariate linear regression

> dim(target)
[1] 6060  441
> dim(dictionary)
[1] 6060 2030
> fit = lm(target~dictionary)

Here is some code that does lasso on a single DV

> fit = glmnet(dictionary, target[,1])

And this is what I would like to do:

> fit = glmnet(dictionary, target)
Error in weighted.mean.default(y, weights) : 
  'x' and 'w' must have the same length

Selecting features that fit ALL the targets at once

Best Answer

For multivariate responses (number of dependent variables larger than 1), you need family = "mgaussian" in the call of glmnet.

The lsgl package is an alternative, which provides a more flexible penalty.

With a $k$-dimensional response, the glmnet package implements the penalty $$\sum_{j = 1}^p \| \boldsymbol{\beta}_j \|_2$$
where $\boldsymbol{\beta}_j = (\beta_{j1}, \ldots, \beta_{jk})^T$ is the vector of coefficients for the $j$th predictor. In the help page for glmnet you can read:

The former [family = "mgaussian"] allows a multi-response gaussian model to be fit, using a "group -lasso" penalty on the coefficients for each variable. Tying the responses together like this is called "multi-task" learning in some domains.

This penalty is an example of a group lasso penalty, which groups parameters for the different responses that are associated to the same predictor. It results in the selection of the same predictors across all responses for a given value of the tuning parameter.

The lsgl package implements sparse group lasso penalties of the form $$\alpha \sum_{j=1}^p \sum_{l = 1}^k \xi_{jl} |\beta_{jl}| + (1-\alpha) \sum_{j = 1}^p \gamma_{j} \| \boldsymbol{\beta}_j \|_2$$
where $\xi_{jl}$ and $\gamma_{j}$ are certain weights chosen to balance the contributions from the different terms. The default is $\xi_{jl} = 1$ and $\gamma_{j} = \sqrt{k}$. The parameter $\alpha \in [0,1]$ is a tuning parameter. With $\alpha = 0$ (and $\gamma_j = 1$) the penalty is equivalent to the penalty used by glmnet with family = "mgaussian". With $\alpha = 1$ (and $\xi_{jl} = 1$) the penalty gives ordinary lasso. The lsgl implementation also allows for an additional grouping of the predictors.

A note about group lasso. The term group lasso is often associated with a grouping of predictors. However, from a more general viewpoint, group lasso is simply a grouping of parameters in the penalty. The grouping used by glmnet with family = "mgaussian" is a grouping of parameters across responses. The effect of such a grouping is to couple the estimation of the parameters across the responses, which turns out to be a good idea, if all the responses can be predicted from roughly the same set of predictors. The general idea of coupling multiple learning problems, that are expected to share some structure, is known as multi-task learning.

Related Solutions

Solved – Seemingly unrelated regression and multivariate Regression

As mentioned in the Wikipedia: SUR is equivalent to the equation-by-equation OLS under the following two conditions:

i. When there are no cross-equation correlations between the error terms

ii. When each equation contains exactly the same set of regressors.

That being said, if your model and data don't satisfy above two cases, then you can proceed with the SUR.

Solved – Lasso regression feature selection

Everyone claims that LASSO address multicollinearity

Lasso regression allows you to set one or many of your feature coefficients to be exactly equal to zero by varying the parameter $\lambda$. In cases of multi-correlation, i.e. many features are correlated with each other, this can be useful as the Lasso regression will set some of them to zero and leave the others to do their job, hence reducing the number of correlated variables.

I didn't understand how does this phenomenon translate to addressing multicollinearity when there are many groups of correlated variables in the data

As @Gavin Simpson points out, Lasso will have a tendency to select 1 variable from a group of correlated variables - but again this will depend on the value of $\lambda$ chosen and on the specifics of your dataset - it is not a rule or a certainty.

An example

An example in 2D as it is hard to visualizing anything more - where $x_1$ and $x_2$ are highly correlated, you see that from the shape of the OLS contour plot on the bottom right

At some point, there is a value of $\lambda$ where the lasso solution (the red dot) is equal to 0 for one coefficient ($\beta_1$) and $-2.5$ for the other coefficient ($\beta_2$) - so you could say that this particular value of $\lambda$ allows to select only one feature from a group (of 2) correlated features. The same applies in higher dimensions

Code to generate the figure can be found here

Best Answer

Related Solutions

Solved – Seemingly unrelated regression and multivariate Regression

Solved – Lasso regression feature selection

An example

Related Question