[Math] the purpose of an oracle in optimization

convex optimizationdefinitionnonlinear optimizationnumerical optimizationoptimization

I am reading a textbook on convex optimization, and in it there was an extremely short discussion of so called "oracle model" on page 136, which has left me confused.

Pardon my ignorance but why do people in optimization bother with an "oracle model"?

  1. Oracle model assumes that we do not know our objective function. Does this actually occur in practice?

  2. Oracle model returns the function value $f(x)$ and perhaps gradient information $\nabla f(x)$. Why the seemingly arbitrary restriction on the type of information that can be "queried" from such an object?

  3. In the text, it says: "If we are given a parameter problem description, we can construct an oracle for it, which simply evaluates the required functions and derivatives when queried." Is this sentence simply saying if we had a parameter problem description i.e., the objective function is known, then we can write a Python script that calculates its gradient? I don't understand the importance of constructing an oracle, or even what it means to construct an oracle.

All this talk of oracles, queries and black boxes has left me confused, how can this idea of an oracle be helpful?

Best Answer

In optimization theory and computational practice, there's a difference between having a formula for the function that you want to minimize and having access to a routine that can compute values of that function. If you're given an explicit formula for the objective function then you can compute its gradient and Hessian (if the function is twice continuously differentiable), determine a Lipschitz constant for the gradient, etc. If all you've got is the ability to call the function to compute values of $f(x)$ and perhaps $\nabla f(x)$, then you have limited information to work with.

In computational practice, this is important because you often are optimizing functions that are implemented by very complicated programs for which this kind of analysis is difficult. Or, you may not even have access to the source code. For this reason, you want algorithms that can work with minimal information. Most library routines for optimization have interfaces in which the user supplies subroutines to compute the function and perhaps its gradient or even its Hessian. Of course, if you have a mathematical formula for the function that you want to minimize, it is relatively simple to write code to implement that function.

In theoretical analysis of optimization algorithms, it is also interesting to see how much can be done with limited information. If you had perfect information about the function you were minimizing, then you could simply write down the minimum- the problem is theoretically interesting precisely because you have limited information. There is a desire to find the best possible optimization algorithm, but this can only be determined with respect to a specific model of the properties of the function and information about the function that is available to the algorithm.

For example, it can be shown that for smooth convex functions, using only values of $f(x)$ and $\nabla f(x)$ at specific points, the best possible algorithm has $f(x^{k})-f(x^{*})$ decreasing at a rate of $O(1/k^{2})$ where $k$ is counting the number of evaluations of $f(x)$ and $\nabla f(x)$. Furthermore, Nesterov's accelerated gradient method is optimal in that it achieves this $O(1/k^{2})$ convergence.

Related Question