ADAM Optimizer – What Is a Random Variable in ADAM Optimizer?

adammachine learningmathematical-statisticsoptimization

Look at the definition of ADAM optimizer, from the original paper of Kingma and Ba (paper):

See algorithm $1$ for pseudo-code of our proposed algorithm Adam. Let $f(\theta)$ be a noisy objective function: a stochastic scalar function that is differentiable w.r.t. parameters $\theta$. We are interested in minimizing the expected value of this function, $\mathbb{E}[f(\theta)]$ w.r.t. its parameters $\theta$.

Of course, this definition is given in the context of minimization of loss function. But what I don't understand, is how can we say that we want to minimize $E[f(\theta)]$, when $f$ has clear formula (for example MSE). In this paper it make sense, since they say, that $f(\theta)$ is a stochastic function, but I don't understand where we have this stochastic part, when we just want to minimize MSE (for example, which does not have any stochastic part).

Could you please explain to me where is the stochastic part of the loss function, and why it make sense to take the expectation? What is the random variable?

Best Answer

Converting my comment into an answer.

The sentence right below your screenshot in the paper is the answer.

The stochasticity might come from the evaluation at random subsamples (minibatches) of datapoints, or arise from inherent function noise.

Related Question