Solved – Stochastic Gradient Descent – how to choose learing rate

classificationgradient descent

I have a large set of data and I want to train an SGD classifier (using sklearn.linear_model.SGDClassifier) as it's impossible to fit all data in memory..

I am asking to know how should I choose the model's parameters and the learning rate (alpha in particular).. thanks

Best Answer

Setting the learning rate is often tricky business, which requires some trial and error. The general approach is to divide your data into training, validation, and testing sets. Start with a relatively high learning rate and look at how the error on your validation set is changing (if it's not dropping, your learning rate is probably too high). Once your validation error stops decreasing, lower your learning rate until the validation error plateaus again. Keep repeating this until you're no longer getting results. Finally, once you're happy with your error rate, test on the test set.

The logic is that you're first figuring out the coarse area of parameter space that is globally best, then fine-tuning with a lower step size. An important point here is that you should be doing this tuning on the validation set, to avoid using the test data to fit your hyperparameters.