Solved – How to evaluate stacking ensemble model vs. other models with 10-fold cross-validation

cross-validationensemble learningstacking

I've been comparing various predictive models for both continuous and binary outcomes for a health care model.

So far 10-fold cross-validation has been useful: training models on 9/10 of the analysis dataset, scoring and evaluating prediction performance on the remaining 1/10, and repeating for each of the ten folds.

I'd like to implement the stacking generalization ensemble model & compare with my prior (non-stacked) models.

Question: What is the proper procedure for evaluating a stacking ensemble model vs. the other models with 10-fold cross-validation?

Am I correct that I need to further divide each of the 10 training folds into two subsets, A and B, and follow steps 1-4 below for each of the i=1 to 10 folds?

1) Train the stage 0 ensemble models (logistic regression, random forests, etc.) on training subset i_A,

2) Score training subset i_B records with the stage 0 models to generate the model predictions,

3) Train the stage 1 ensemble stacker on the predictions from training subset i_B, and finally

4) Score the corresponding test subset i with the stage 1 ensemble model created in Step 3 and compare predictive performance with other non-stacked models.

I'm not sure if steps 1-4 are properly called nested cross-validation or 2-fold stacking.

Best Answer

From what I have seen in Kaggle competitions, it is not exactly how it is done in practice (but it is quite close). Basically, they do cross validation for the second level model and for each CV training set, they use again CV for first level models. It is close to what you have written but your i_A and i_B are drawn by CV.

An example of this use is here, in the out-of-fold predictions code part (but he only applies CV for the first level models).

Then, in this book, p500, they clearly describe how to combine stacking and cross validation. Here is the interesting part:

The steps are described here:

Best Answer

Related Solutions

Solved – Stacking/ensembling models with caret

Solved – k-fold Cross validation of ensemble learning

Related Question