Solved – Does a smaller learning rate help performance of a Gradient Boosting Regressor

boosting

This page shows how a learning rate of less than 1.0 can improve the performance of a Gradient Boosting Classifier, in sklearn. It shows that over many trees, a smaller learning rate plateaus at a lower value

http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html#sphx-glr-auto-examples-ensemble-plot-gradient-boosting-regularization-py

I have been trying to duplicate that outcome with a Gradient Boosting Regressor, not classifier. I have not been able to. For a bunch of different algorithms I am trying to match with regression, the learning rate of 1.0 was always superior to a learning rate of .1. Most of the algorithms I looked at was various combinations of sine & cosine in 2D space.

Here is the code I ran to generate that GB regressor. Instead of my current z = np.sin(XY[:,0] * 10 + XY[:,1]*3) + np.cos(XY[:,0] *2 ) + 3*np.sin(XY[:,1] * 5) What algorithm can I use where a smaller learning rate will be better ? Or is that only applicable for classification not regression ?

# License: BSD 3 clause

# importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import ensemble

# Create the dataset
np.random.seed(10)
XY = np.mgrid[0:10.1:0.5, 0:10.1:0.5].reshape(2,-1).T
z = np.sin(XY[:,0] * 10 + XY[:,1]*3) + np.cos(XY[:,0] *2 ) + 3*np.sin(XY[:,1] * 5)

# set the parameters for the regressors
max_depth = 2
n_estimators = 15      # how many steps to take
learning_rate = 1.0    # this controls how fast the model converges


plt.figure()

original_params = {'n_estimators': 10000, 'max_leaf_nodes': 4, 'max_depth': None, 'random_state': 2,
                   'min_samples_split': 5}

for label, color, setting in [('No shrinkage', 'orange',
                               {'learning_rate': 1.0, 'subsample': 1.0}),
                              ('learning_rate=0.1', 'turquoise',
                               {'learning_rate': 0.1, 'subsample': 1.0})  ]:
    params = dict(original_params)
    params.update(setting)

    clf = ensemble.GradientBoostingRegressor(**params)
    clf.fit(XY, z)

    # compute test set deviance
    test_deviance = np.zeros((params['n_estimators'],), dtype=np.float64)

    for i, z_pred in enumerate(clf.staged_decision_function(XY)):
        # clf.loss_ assumes that y_test[i] in {0, 1}
        test_deviance[i] = clf.loss_(z, z_pred)

    plt.plot((np.arange(test_deviance.shape[0]) + 1)[::5], test_deviance[::5],
            '-', color=color, label=label)



plt.legend(loc='upper left')
plt.xlabel('Boosting Iterations')
plt.ylabel('Test Set Deviance')

plt.show()

Best Answer

You have not added any random noise to your data. The learning rate is a regularization strategy to protect your model from over fitting. If your data is noiseless, and you include all the variables that are in reality related to the response, then you will not overfit.

Try something like this:

z_signal = np.sin(XY[:,0] * 10 + XY[:,1]*3) + np.cos(XY[:,0] *2 ) + 3*np.sin(XY[:,1] * 5)
z = z_signal + random.normal(XY.shape[0])
Related Question