Solved – Various methods for predicting multiple dependent variables (python)

multivariate analysismultivariate regressionpythonregressionscikit learn

I would like to model and predict multiple dependent variables depending on one or more independent variables. The most straightforward method appears to be multivariate regression. I was wondering though whether there are any other methods one might want to take into consideration. And does the case change if the independent variables are a mix of continuous and categorical variables?

I'm using python and mostly the sklearn package. Any advice specific to that, or other packages I might look into are also appreciated. So far I've tried using a set of linear regression models or a set of regression trees. This gives a different result from multivariate analysis (probably the depend variables are correlated), but I haven't figured out how to do that yet, I'm this functionality must've been provided somewhere.

This is the first time I'm posting a question. I didn't go into details what the data means exactly. Please let me know if there is anything I should clarify.

Best Answer

A possible solution is to train a prediction model for each dependent variable using all the independent variables in each case. Indeed, you can use different models in each case (in case you want to handle categorical and numerical data with different models).

Notice that since this approach treats each dependent variable independently, so possible relations between the predicted variables are not taken into account (if they exists).

Also check out scikit-multilearn package, but I have no experience using it.

Related Question