Solved – How to stack machine learning models in R

ensemble learningmachine learningrstacking

I am new to machine learning and R.

I know that there is an R package called caretEnsemble, which could conveniently stack the models in R. However, this package looks has some problems when deals with multi-classes classification tasks.

Temporarily, I wrote some codes to try to stack the models manually and here is the example I worked on:

    library(caret)
    set.seed(123)
    library(AppliedPredictiveModeling)
    data(AlzheimerDisease)
    adData = data.frame(diagnosis, predictors)
    inTrain = createDataPartition(adData$diagnosis, p = 3 / 4)[[1]]
    training = adData[inTrain,]
    testing = adData[-inTrain,]

    set.seed(62433)
    modelFitRF <- train(diagnosis ~ ., data = training, method = "rf")
    modelFitGBM <- train(diagnosis ~ ., data = training, method = "gbm",verbose=F)
    modelFitLDA <- train(diagnosis ~ ., data = training, method = "lda")

    predRF <- predict(modelFitRF,newdata=testing)
    predGBM <- predict(modelFitGBM, newdata = testing)
    prefLDA <- predict(modelFitLDA, newdata = testing)

    confusionMatrix(predRF, testing$diagnosis)$overall[1]
    #Accuracy 
    #0.7682927 

    confusionMatrix(predGBM, testing$diagnosis)$overall[1]
    #Accuracy 
    #0.7926829 

    confusionMatrix(prefLDA, testing$diagnosis)$overall[1]
    #Accuracy 
    #0.7682927

Now I've got three models: modelFitRF, modelFitGBM and modelFitLDA, and three predicted vectors corresponding to such three models based on the test set.

Then I will create a data frame to contain these predicted vectors and the original dependent variable in the test set:

   predDF <- data.frame(predRF, predGBM, prefLDA, diagnosis = testing$diagnosis, stringsAsFactors = F)

And then, I just used such data frame as a new train set to create a stacked model:

   modelStack <- train(diagnosis ~ ., data = predDF, method = "rf")
   combPred <- predict(modelStack, predDF)
   confusionMatrix(combPred, testing$diagnosis)$overall[1] 
   #Accuracy 
   #0.804878

Considering that stacking models usually should improve the accuracy of the predictions, I'de like to believe this might be a right to stack the models. However, I also doubt that here I used the predDF which is created by the predictions from three models with the test set.

I am not sure whether I should use the results from the test set and then apply them back to the test set to get final predictions?
(I am referring to this block below:)

   predDF <- data.frame(predRF, predGBM, prefLDA, diagnosis = testing$diagnosis, stringsAsFactors = F)
   modelStack <- train(diagnosis ~ ., data = predDF, method = "rf")
   combPred <- predict(modelStack, predDF)
   confusionMatrix(combPred, testing$diagnosis)$overall[1] 

Best Answer

What you're doing here is what I refer to as "Holdout Stacking" (sometimes also called Blending but that term is also used for regular Stacking), where you use a holdout set to generate the training data for the metalearning algorithm (i.e. predDF). I use the term Holdout Stacking to differentiate from regular Stacking (or "Super Learning") where you generate cross-validated predicted values from the base learners to generate the training data for the metalearner algorithm (in your case, a Random Forest) rather than a holdout set (your testing frame).

The problem here is not how you're doing the stacking, but how you're evaluating the results. Once you've used the testing frame to generate the predDF frame, you have to throw that data away and not use it for model evaluation. In your example, you are also using the testing frame to evaluate the performance of the base models and the ensemble learner.

To fix this, just partition off another chunk of your data. You should have three datasets: training, validation and testing. Use the validation set to create predDF (also known as the "level one" dataset in stacking terminology).

# Generate level-one dataset for training the ensemble metalearner
predRF <- predict(modelFitRF, newdata = validation)
predGBM <- predict(modelFitGBM, newdata = validation)
prefLDA <- predict(modelFitLDA, newdata = validation)
predDF <- data.frame(predRF, predGBM, prefLDA, diagnosis = validation$diagnosis, stringsAsFactors = F)

# Train the ensemble
modelStack <- train(diagnosis ~ ., data = predDF, method = "rf")

Then evaluate your base learners and your ensemble on the testing set to get a better idea of how the ensemble compares to the individual learners.

# Generate predictions on the test set
testPredRF <- predict(modelFitRF, newdata = testing)
testPredGBM <- predict(modelFitGBM, newdata = testing)
testPredLDA <- predict(modelFitLDA, newdata = testing)

# Using the base learner test set predictions, 
# create the level-one dataset to feed to the ensemble
testPredLevelOne <- data.frame(testPredRF, testPredGBM, testPredLDA, diagnosis = testing$diagnosis, stringsAsFactors = F)
combPred <- predict(modelStack, testPredLevelOne)

# Evaluate ensemble test performance
confusionMatrix(combPred, testing$diagnosis)$overall[1]

# Evaluate base learner test performance 
confusionMatrix(testPredRF, testing$diagnosis)$overall[1]
confusionMatrix(testPredGBM, testing$diagnosis)$overall[1]
confusionMatrix(testPredLDA, testing$diagnosis)$overall[1]

Lastly, as a suggestion, I'd recommend trying a GLM for the metalearning algorithm because they seem to perform better than tree-based models in my experience, though that is not always the case.

If you're specifically looking for multiclass support in Stacking, it will be available soon in the h2o R package. If you don't need multiclass, then you can check out either the SuperLearner or h2o packages to do stacking more easily than writing it all out by hand. See the SuperLearner() or the h2o.stackedEnsemble() functions to do Stacking with one line of code.