[Math] Gradient descent vs ternary search

convex optimizationgradient descentmachine learningoptimization

Consider a strictly convex function $f: [0; 1]^n \rightarrow \mathbb{R}$.
The question is why people (especially experts in machine learning) use gradient descent in order to find a global minimum of this function instead of using n-dimensional ternary search or golden section search?

Here is a list of disadvantages:

  1. It is required for gradient descent to experimentally choose a value of step size $\alpha$.
  2. It is also required to calculate partial derivatives of $f$. Furthermore, it is not always possible, for example, the «black box» function.
  3. Ternary search guaranteed to converge in $\Theta(\lg^n \epsilon^{-1})$ iterations, where $\epsilon$ is our required absolute precision. However, for gradient descent we should choose number of iterations experimentally.

Maybe I misunderstand a bit?

Thanks in advance.

Best Answer

There is no "$n$-dimensional ternary search or golden section search". These methods only work in one dimension.

What you can do in principle is use the gradient to determine the direction in which to search, and then apply ternary search or golden section search along that direction. However, you don't want to find the exact minimum along the chosen search direction, because you'll recompute the gradient and minimize along a different line immediately after that anyway. So in practice, you only spend enough time to find a point that guarantees making progress, for example by using the Wolfe conditions.

Related Question