Solved – Ensemble of different kinds of regressors using scikit-learn (or any other python framework)

ensemble learningregressionscikit learn

I am trying to solve the regression task. I found out that 3 models are working nicely for different subsets of data: LassoLARS, SVR and Gradient Tree Boosting. I noticed that when I make predictions using all these 3 models and then make a table of 'true output' and outputs of my 3 models I see that each time at least one of the models is really close to the true output, though 2 others could be relatively far away.

When I compute minimal possible error (if I take prediction from 'best' predictor for each test example) I get a error which is much smaller than error of any model alone. So I thought about trying to combine predictions from these 3 diffent models into some kind of ensemble. Question is, how to do this properly? All my 3 models are build and tuned using scikit-learn, does it provide some kind of a method which could be used to pack models into ensemble? The problem here is that I don't want to just average predictions from all three models, I want to do this with weighting, where weighting should be determined based on properties of specific example.

Even if scikit-learn not provides such functionality, it would be nice if someone knows how to property address this task – of figuring out the weighting of each model for each example in data. I think that it might be done by a separate regressor built on top of all these 3 models, which will try output optimal weights for each of 3 models, but I am not sure if this is the best way of doing this.

Best Answer

Actually, scikit-learn does provide such a functionality, though it might be a bit tricky to implement. Here is a complete working example of such an average regressor built on top of three models. First of all, let's import all the required packages:

from sklearn.base import TransformerMixin
from sklearn.datasets import make_regression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LinearRegression, Ridge

Then, we need to convert our three regressor models into transformers. This will allow us to merge their predictions into a single feature vector using FeatureUnion:

class RidgeTransformer(Ridge, TransformerMixin):

    def transform(self, X, *_):
        return self.predict(X).reshape(len(X), -1)


class RandomForestTransformer(RandomForestRegressor, TransformerMixin):

    def transform(self, X, *_):
        return self.predict(X).reshape(len(X), -1)


class KNeighborsTransformer(KNeighborsRegressor, TransformerMixin):

    def transform(self, X, *_):
        return self.predict(X).reshape(len(X), -1)

Now, let's define a builder function for our frankenstein model:

def build_model():
    ridge_transformer = Pipeline(steps=[
        ('scaler', StandardScaler()),
        ('poly_feats', PolynomialFeatures()),
        ('ridge', RidgeTransformer())
    ])

    pred_union = FeatureUnion(
        transformer_list=[
            ('ridge', ridge_transformer),
            ('rand_forest', RandomForestTransformer()),
            ('knn', KNeighborsTransformer())
        ],
        n_jobs=2
    )

    model = Pipeline(steps=[
        ('pred_union', pred_union),
        ('lin_regr', LinearRegression())
    ])

    return model

Finally, let's fit the model:

print('Build and fit a model...')

model = build_model()

X, y = make_regression(n_features=10)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model.fit(X_train, y_train)
score = model.score(X_test, y_test)

print('Done. Score:', score)

Output:

Build and fit a model...
Done. Score: 0.9600413867438636

Why bother complicating things in such a way? Well, this approach allows us to optimize model hyperparameters using standard scikit-learn modules such as GridSearchCV or RandomizedSearchCV. Also, now it is possible to easily save and load from disk a pre-trained model.

Best Answer

Related Solutions

Model Stacking – Why Ensemble Learning Methods are Effective

Solved – Random forest low score on testing data (scikit-learn)

Related Question