Solved – Regression by multiple dependent variables with constraints & feature selection

feature selectionmultivariate regression

I have a data set of 1000 records. Each record has three dependent variables $y_1, y_2, y_3$ and 100 independent variables $x_1,…,x_{100}$, where the dependent variable $y_i$ satisfies:

  1. $0\le y_i \le1$
  2. $y_1 + y_2 + y_3 =1$

I.e. $y_i$ represent the probability of a observation belonging to one of the three classes.

Q1: How can I build a (multivariate linear) model using $x_1,…,x_{100}$ to predict ($y_1,y_2,y_3$)? Is any R package available? How can I implement it?

Q2: Since there are so many independent variables $x_1,…,x_{100}$ (features), is it possible to do feature selection using LASSO, SCAD, or elastic net for this multivariate linear model using tools in glmnet package?

Best Answer

You can try the PLS regression which is fit for datasets with multiple dependant variables.

Q1 : For a multivariate linear model, combine it with the clusterwise approach.

Q2 : PLS is built to work with many independant variables, even with more than observations.

Here there is a R package available.

We're working on another open source implementation which will improve it, it will be available soon.

Related Question