Solved – How to combine results of logistic regression and random forest

logisticmachine learningrandom forest

I am new to machine learning. I applied logistic regression and random forest on a same dataset. So I get variable importance (absolute coefficient for logistic regression and variable importance for random forest). I am thinking to combine the two to get a final variable importance. Can anyone share his/her experience? I've checked bagging, boosting, ensemble modeling, but they are not what I need. They are more of combining information for the same model across replicates. What I am looking for is to combine result of multiple models.

Best Answer

It probably depends on what you want to use variable importances for. Is it to be used as a criterion for feature selection for a third classification model? In that case you could try to compute a weighted average the variable importances (maybe after normalizing each individual variable importance vector to unit length) for various values and the averaging weight and then pickup the value that yields the best cross-validated score for the final model.

As for combining the outcome of the logistic regression model and the random forest model (without considering variable importances), the following blog post is very informative and demonstrates that a single averaging of the output is a simple yet very effective ensemble method for regression models.