Solved – Why we call ADAM an a adaptive learning rate algorithm if the step size is a constant

adamdeep learning

In the book "Deep Learning" by Goodfellow et.al, the ADAM algorithm is described in sub-chapter 8.5 "Algorithm with Adaptive Learning Rate".

To my understanding an adaptive learning rate should automatically change the value of the step sizes during the iterations. However, according to the pseudo-code 8.7 (see picture below) the step size $\epsilon$ is a constant.

Thus, in what is ADAM an adaptive learning rate algorithm?

enter image description here

Best Answer

In the weight update, the contribution of bias correction of moments varies exponentially over epochs completed. Although the step_size/lr hyperparamter is constant, the contribution of gradients to updated weight varies over epochs,hence ADAPTIVE!!!!!!