Solved – Stacking models which trained by different features in a data set for a classification problem

machine learningstacking

Normally, the first layer models in stacking and bledning method are trained by all features in a data set. However, what will be the performance of a big model based on first layer models trained by different features in a same data set?

For example, if a big model contains 4 first layer models was built to predict the labels of a data set which contains 12 features. Now, instead of using all 12 features to train each of the first layer models, the data set is divided into 4 subsets which each of them contains partial features. For each of the subsets, it is used to train one of the sub models. What will be the accuracy of this model comparing to the model using all 12 featues to train each of its first layer models?

Thanks!

Best Answer

Stacking ensembles are usually heterogeneous ensembles that use learners of different types. In order for ensemble methods to be more accurate than any of its individual members the base learners have to be as accurate as possible and as diverse as possible. Diversity can be achieved by using different learners, sub-sampling the data and/or the features, or using learners with different parameter settings. It is possible that one of the base learners is a random forest in which case there will be sub-sampling of data and features. A potential danger in using a sub-set of features for each base model is that the accuracy will degrade in favor of greater diversity. This may be dataset specific and may or may not improve the accuracy of the overall ensemble. It seems that cross-validation will give an experimental answer to this question.