[Math] Summary of Optimization Methods.

convex optimizationintuitionnonlinear optimizationoptimization

Context: So in a lot of my self-studies, I come across ways to solve problems that involve optimization of some objective function. (I am coming from signal processing background).

Anyway, I seem to be learning about those methods as I trudge along; in that, they are introduced when there is a problem at hand. This is fine, I suppose, but I do wish I had more of an overarching mental 'map' of them in my head.

For example, I know about the gradient ascent/descent algorithms, and I have also recently learned about a second-order method such as Newton's method, which uses curvature. (Will post different question about it).

Gradient Ascent/Descent:

$$
\mathbf{ w_{n+1} = w_n \pm \alpha\frac{\delta J(w)}{\delta w}}
$$

Newton's Method:

$$
\mathbf{ w_{n+1} = w_n \pm \frac{\delta J(w)}{\delta w} \begin{bmatrix} \frac{\delta^2 J(w)}{\delta w^2} \end{bmatrix}^{-1}}
$$

Questions:

1) I would like to first off, ask for a summary of other optimization methods that are similar to the above forms, in that, one has to actually compute the first and/or second derivatives of a cost function apriori.

2) My second question is, what optimization methods are there, that dont need one to explicitly have an equation for a cost function, and/or its first/second derivative? For example, let us say I have a black-box cost function that I want to use in an optimization procedure. What method(s) might be available to me to use in this case? Obviously, I would not know its explicit equation for the cost function, or any of its derivatives. It would exist simply as cost = blackbox_cost_function(weight vector, data);

Thanks!

Best Answer

1)

First-order methods have many variants, for example using conjugate directions instead of steepest descent direction (Conjugate Gradient Method).

There is also a multitude of "line search" algorithms for computing step-length in the first order methods. These include binary search algorithms (Gold section), Newton and quasi-Newton methods: The second order method is used to compute step length in first order method. It seems weird, but the point is that first order method can work in multidimensions, while the second-order line search is used only on univariate objective function (line in step direction).

In theory, you can also use numerical derivatives, i.e. compute them from cost function. This allows you to use first order methods without analytical representation for derivatives. Methods for computing derivatives include Finite difference approximation, Neville's method, Ridder's method and Automatic differentiation.

Second-order methods like Gauss-Newton and Levenberg-Marquardt do not use explicit Hessian:

$$H\approx J^{\mathbb{T}}J$$

The reason behind this approximation (which does not require explicit second-order derivatives) would be outside scope of the answer, but using such Hessian tend to be more numerically stable because second-order terms are noise-sensitive.


2)

There are many methods for derivative-free optimization, some are designed for large and sparse datasets, like the black-box cost function you have. Such methods include: Model-based methods, Coordinate and Pattern-search Methods, Conjugate-Direction Method, Nelder-Mead Method and Implicit Filtering.

Maybe a proper design of your cost function will allow you to omit unknown data without harming the result too much.


The following books helped me greatly and both go into detail when it comes to high-fidelity optimization (large and sparse datasets, unknown derivatives):

Jorge Nocedal, Stephed J. Wright: "Numerical Optimization, Second Edition"

Press, Teukolsky, Vetterling, Flannery: "Numerical Resipes, Third Edition"

Related Question