Solved – Ensemble models in R

caretensemble learningr

I have a clinical dataset (1400 cases) and I applied 4 data mining techniques (ANN, Decision Tree, SVM, Logistic Regression) to predict the binary outcome (Yes, No).

Now, I want to improve prediction accuracy through ensemble methods.
What are the criteria to choose which model can be combined with another model?
And how can that be done in R? Can I use the "caret" package?

Best Answer

Let reverse some answers to your questions:

  1. Yes this can be done in R.
  2. You can use the caret package to compare models, but for automatically build an ensemble you can use the package caretEnsemble. Read the vignettes first!
  3. Creating ensembles is as much art as it is science if you want to do it manually. But it gives you more control over what is happening. It all depends a bit on which kind of assembling you want to do.

    Voting ensembles: combine the outcome of multiple predictions and have a majority vote. I.e. if you have the predictions of 3 models, and 2 models predict a 1 and 1 model a 0, the outcome is 1.

    Averaging: Average the outcome of multiple models by taking the mean.

    With both voting and averaging, less highly correlated models work better than highly correlated models. But even highly correlated models might improve the final answer, so it bears checking out.

There are more methods, but starting with these is a good way to see what is going on.

A very good guide to this is written by mlwave. There is more information there also about stacking and blending.