Solved – How to combine predictions in ensemble

ensemble learningregression

I am trying to learn more about how to build ensembles of predictions in R and coming to a roadblock, and am hoping one can offer guidance.

I often read about people automatically identifying how they should weight each model through the use of OLS. How do people do this? Do you just insert your prediction from each model as a regressor in the model?

E.g.,
Final_Prediction = b0 + b1*prediction_from_GBM + b2*prediction_from_SVM + bkxk + e

and just fit the line above to combine your predictions?

What about when you are predicting class membership…. do you get the probabilities from each model and fit them in a logistic model similarly?

Any resources that are R specific or any thoughts / clarifications are greatly appreciated. I have not been able to find anything.

Best Answer

I have experimented with the following methods of combining predictions, with varying degrees of success:

Take an average of the predictions. For regression models, you can take the average of the predictions themselves. For classification models, you can take the average of the class probabilities.
1. Similar to the above, but take a weighted average. You could determine the weights through linear regression, as you suggested in your question. I don't think you'd need the intercept term though.
2. Again, similar to (1), but use non-linear methods to determine the optimal weights. For example, you could train a neural net, random forest or some other statistical learning algorithm by feeding it the predictions of your ensemble's constituent models.
3. For classification problems, combine the predictions in a voting arrangement. Depending on your problem, you could choose your final prediction as the prediction that received the majority of the votes or the most votes. For some binary classification problems, it may be appropriate to demand consensus, depending on the consequences of misclassification.

Whatever method you choose, you should ensure that it is appropriately cross-validated. In some instances, it would be very easy to overfit, especially using (3) above.

There are some R packages that are built for combining predictions. caretEnsemble is fantastic for combining models tuned with the caret package. I understand that H20 and SuperLearner are built with ensembling in mind, though I've not used these packages extensively.

Related Solutions

Solved – Ensemble of different kinds of regressors using scikit-learn (or any other python framework)

Actually, scikit-learn does provide such a functionality, though it might be a bit tricky to implement. Here is a complete working example of such an average regressor built on top of three models. First of all, let's import all the required packages:

from sklearn.base import TransformerMixin
from sklearn.datasets import make_regression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LinearRegression, Ridge

Then, we need to convert our three regressor models into transformers. This will allow us to merge their predictions into a single feature vector using FeatureUnion:

class RidgeTransformer(Ridge, TransformerMixin):

    def transform(self, X, *_):
        return self.predict(X).reshape(len(X), -1)


class RandomForestTransformer(RandomForestRegressor, TransformerMixin):

    def transform(self, X, *_):
        return self.predict(X).reshape(len(X), -1)


class KNeighborsTransformer(KNeighborsRegressor, TransformerMixin):

    def transform(self, X, *_):
        return self.predict(X).reshape(len(X), -1)

Now, let's define a builder function for our frankenstein model:

def build_model():
    ridge_transformer = Pipeline(steps=[
        ('scaler', StandardScaler()),
        ('poly_feats', PolynomialFeatures()),
        ('ridge', RidgeTransformer())
    ])

    pred_union = FeatureUnion(
        transformer_list=[
            ('ridge', ridge_transformer),
            ('rand_forest', RandomForestTransformer()),
            ('knn', KNeighborsTransformer())
        ],
        n_jobs=2
    )

    model = Pipeline(steps=[
        ('pred_union', pred_union),
        ('lin_regr', LinearRegression())
    ])

    return model

Finally, let's fit the model:

print('Build and fit a model...')

model = build_model()

X, y = make_regression(n_features=10)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model.fit(X_train, y_train)
score = model.score(X_test, y_test)

print('Done. Score:', score)

Output:

Build and fit a model...
Done. Score: 0.9600413867438636

Why bother complicating things in such a way? Well, this approach allows us to optimize model hyperparameters using standard scikit-learn modules such as GridSearchCV or RandomizedSearchCV. Also, now it is possible to easily save and load from disk a pre-trained model.

Solved – How to proceed with building an ensemble classifier using Naive Bayes, TAN and Logistic Regression in R

You're discarding information by taking continuous predictions and making them into categories. I would strongly advise against taking that path. Instead, you could (1) average or (2) stack the models. Averaging is straightfoward and requires no additional tuning; on the other hand, all models are weighted equally which might be suboptimal. Stacking adds another layer of learning to train the model on the model outputs against unseen data outcomes.

Best Answer

Related Solutions

Solved – Ensemble of different kinds of regressors using scikit-learn (or any other python framework)

Solved – How to proceed with building an ensemble classifier using Naive Bayes, TAN and Logistic Regression in R

Related Question