Solved – Issues with stochastic gradient descent

gradient descentstochastic gradient descentstochastic-processes

I am using stochastic gradient descent to learn a model. Here is the plot of the objective function for the iterations. I am trying to maximize the function value.

enter image description here

Taking the average of 500 iterations, I have this next plot

enter image description here

As you can see the function value is increasing and we can say the algorithm is converging. However, looking at the first plot, we can see that the function never actually went past a certain threshold. Even though the minimum value it could reach went on converging with the iterations, the maximum value that it could attain remained almost the same.

So can we say the algorithm is converging?

Best Answer

Yes, you can say that the algorithm is converging because it is increasing the objective value on average.

The most tricky part of stochastic gradient descent (SGD) is the 'learning rate'. Common choices are 1/t, 1/sqrt(t), D/(G*t) where t is the iteration number, D is the max diameter of your feasible set, and G is the infinity-norm of the current gradient. You should experiment with these. You can even split your data into two for cross validation of the learning rate.

Another thing you can try is mini-batch SGD. In this variant, instead of using a single data point to compute the gradient, you use a batch of points like 10,20. This way, the objective-versus-trial plot (the first plot) will be smoother and will look like the second plot you have.

Related Question