Solved – Parameter n_iter in scikit-learn’s SGDClassifier

scikit learnstochastic gradient descent

I have a doubt concerning parameter n_iter in function SGDClassifier from scikit-learn. Hereafter is the definition:

n_iter : int, optional

The number of passes over the training data (aka epochs). The number of iterations is set to 1 using partial_fit. Defaults to 5.

For a data set of size $n$, I can think of two interpretations for the text above:

  • Interpretation 1: The algorithm only picks randomly n_iter data points in total and computes the gradient at those points, so that the total number of evaluations of the gradient is n_iter (so only 5 by default ?!).
  • Interpretation 2: The algorithm goes through the whole data set n_iter times, picking for each of the n_iter loops the $n$ points (random sampling without replacement basically) so that the total number of evaluations of the gradient is $n \ \times $n_iter.

Given that the advice of scikit-learn is to pick n_iter equal to np.ceil(10**6/n), I have trouble understanding how for $n=1,000,000$ the algorithm is expected to converge after just 1 computation if interpretation 1 above is correct…

Empirically, we found that SGD converges after observing approx. 10^6 training samples. Thus, a reasonable first guess for the number of iterations is n_iter = np.ceil(10**6 / n), where n is the size of the training set.

Could someone shed some light on this?

Description of the function is here and technical documentation here.

Best Answer

It must be the second.

I always answer these questions by looking at the source code (which in sklearn is of very high quality, and is written extremely clearly). The function in question is here (I searched for SGDClassifier then followed the function calls until I got to this one, which is a low level routine).

Breaking out the important piece:

for epoch in range(n_iter):
    ...
    for i in range(n_samples):
        ...

That's exactly the pattern you would expect for n_iter passes over the full training data.