Continuous proportions are sometimes modelled using beta regression. Logit transformation of the proportions are sometimes used. If there are multiple proportions that sum to 1 (compositional data), this is sometimes done via Dirichlet models.
These terms should help you find many relevant questions and answers here on CV, and good pointers more generally (google searches are highly productive).
references - e.g.
Smithson, M. and Verkuilen, J. (2006). A better lemon squeezer? maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods, 11(1):54-71.
http://psychology3.anu.edu.au/people/smithson/details/betareg/Smithson_Verkuilen06.pdf
Also see http://nw08.american.edu/~jernigan/comp.pdf
For multivariate responses (number of dependent variables larger than 1), you need family = "mgaussian"
in the call of glmnet
.
The lsgl package is an alternative, which provides a more flexible penalty.
With a $k$-dimensional response, the glmnet package implements the penalty
$$\sum_{j = 1}^p \| \boldsymbol{\beta}_j \|_2$$
where $\boldsymbol{\beta}_j = (\beta_{j1}, \ldots, \beta_{jk})^T$ is the vector of coefficients for the $j$th predictor. In the help page for glmnet
you can read:
The former [family = "mgaussian"
] allows a multi-response gaussian model to be fit, using a "group -lasso" penalty on the coefficients for each variable. Tying the responses together like this is called "multi-task" learning in some domains.
This penalty is an example of a group lasso penalty, which groups parameters for the different responses that are associated to the same predictor. It results in the selection of the same predictors across all responses for a given value of the tuning parameter.
The lsgl package implements sparse group lasso penalties of the form
$$\alpha \sum_{j=1}^p \sum_{l = 1}^k \xi_{jl} |\beta_{jl}| + (1-\alpha) \sum_{j = 1}^p \gamma_{j} \| \boldsymbol{\beta}_j \|_2$$
where $\xi_{jl}$ and $\gamma_{j}$ are certain weights chosen to balance the contributions from the different terms. The default is $\xi_{jl} = 1$ and $\gamma_{j} = \sqrt{k}$. The parameter $\alpha \in [0,1]$ is a tuning parameter. With $\alpha = 0$ (and $\gamma_j = 1$) the penalty is equivalent to the penalty used by glmnet
with family = "mgaussian"
. With $\alpha = 1$ (and $\xi_{jl} = 1$) the penalty gives ordinary lasso. The lsgl implementation also allows for an additional grouping of the predictors.
A note about group lasso. The term group lasso is often associated with a grouping of predictors. However, from a more general viewpoint, group lasso is simply a grouping of parameters in the penalty. The grouping used by glmnet
with family = "mgaussian"
is a grouping of parameters across responses. The effect of such a grouping is to couple the estimation of the parameters across the responses, which turns out to be a good idea, if all the responses can be predicted from roughly the same set of predictors. The general idea of coupling multiple learning problems, that are expected to share some structure, is known as multi-task learning.
Best Answer
As mentioned in the Wikipedia: SUR is equivalent to the equation-by-equation OLS under the following two conditions:
That being said, if your model and data don't satisfy above two cases, then you can proceed with the SUR.