Solved – RANDOM learning rate in gradient descent

gradient descentmachine learningneural networksoptimization

I read this paper on the "Cyclical Learning Rate" method, which cyclically decreases and then INCREASES the learning rate in gradient descent:
http://arxiv.org/abs/1506.01186

Can anyone point me to, or does anyone know of, cases where someone tried randomly changing the learning rate instead of using some prescribed learning rate update schedule?

For example, randomly choosing a learning rate uniformly within some bounds every N iterations, or drawing the rate from a normal distribution?

Best Answer

I just want to add, after three years the question was raised, that there is a paper submitted to ICLR 2019:

LEARNING WITH RANDOM LEARNING RATES https://openreview.net/pdf?id=S1fcnoR9K7

The main idea of the paper is:

We present the All Learning Rates At Once (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. This comes at practically no computational cost. Perhaps surprisingly, stochastic gradient descent (SGD) with Alrao performs close to SGD with an optimally tuned learning rate, for various architectures and problems.

Related Question