I have a doubt concerning parameter n_iter
in function SGDClassifier
from scikit-learn. Hereafter is the definition:
n_iter : int, optional
The number of passes over the training data (aka epochs). The number of iterations is set to 1 using partial_fit. Defaults to 5.
For a data set of size $n$, I can think of two interpretations for the text above:
- Interpretation 1: The algorithm only picks randomly
n_iter
data points in total and computes the gradient at those points, so that the total number of evaluations of the gradient isn_iter
(so only 5 by default ?!). - Interpretation 2: The algorithm goes through the whole data set
n_iter
times, picking for each of then_iter
loops the $n$ points (random sampling without replacement basically) so that the total number of evaluations of the gradient is $n \ \times $n_iter
.
Given that the advice of scikit-learn is to pick n_iter
equal to np.ceil(10**6/n)
, I have trouble understanding how for $n=1,000,000$ the algorithm is expected to converge after just 1 computation if interpretation 1 above is correct…
Empirically, we found that SGD converges after observing approx. 10^6 training samples. Thus, a reasonable first guess for the number of iterations is
n_iter = np.ceil(10**6 / n)
, wheren
is the size of the training set.
Could someone shed some light on this?
Description of the function is here and technical documentation here.
Best Answer
It must be the second.
I always answer these questions by looking at the source code (which in sklearn is of very high quality, and is written extremely clearly). The function in question is here (I searched for
SGDClassifier
then followed the function calls until I got to this one, which is a low level routine).Breaking out the important piece:
That's exactly the pattern you would expect for
n_iter
passes over the full training data.