Solved – Selecting optimization algorithms

optimizationr

I am in the process of developing a multi-state model (using the msm package in R), and have been encountering the error message

initial value in 'vmmin' is not finite

This error occurs regardless of whether or not I scale the parameters or not, and for almost every optimization algorithm except the bobyqa algorithm (from the minqa package). Basically I went through a process of experimentation until something worked, but that does not necessarily mean that the choices I made are methodologically sound. I've tried to read up on this topic, but every resource I find immediately launches into heavy math jargon.

My question is – how does one go about choosing an optimization algorithm? What do I need to take into consideration when making the selection, and to what extent does the selection of a optimization algorithm affect how I interpret the results of the function using the algorithm? Also, I would be grateful if someone could recommend a resource explaining optimization that scaffolds well.

Best Answer

There are a number of things to consider:

  • Global vs. Local
  • Do you have analytic gradients or Hessians
  • Is the function even differentiable
  • The size of the problem/data set
  • How much time do you have
  • Are there special features of the problem for which a particular algorithm may be better

While there exist algorithms that are able to find global extrema (simulated annealing, various genetic algorithms) they can take an inordinately long time for very little return.

If you have analytic gradients or Hessians, it makes quasi-Newton methods much more appealing. If you have Hessians you can use trust methods, although with some exceptions (e.g. R trustOptim) these tend to bog down with large problems as the Hessian grows rather quickly. If you have gradients, you can use BFGS-type methods. If you don't, you can either use gradient-free methods like Nelder-Mead, Subplex (my favorite), COBYLA, BOBYQA, etc. Or you can use the quasi-Newton methods with a finite-differecer to estimate the gradient (as is default in R optim), but only if your objective function is differentiable. If you are looking at a least-squares issue specifically, Levernberg-Marquardt is a possibility.

I believe it does boil down to the specifics of your problem, although I've personally found most success with the Subplex-type algorithms (multi-start Nelder-Mead).