Solved – Advantages of Particle Swarm Optimization over Bayesian Optimization for hyperparameter tuning

bayesian optimizationhyperparameteroptunity

There's substantial contemporary research on Bayesian Optimization (1) for tuning ML hyperparameters. The driving motivation here is that a minimal number of data points are required to make informed choices about what points are worthwhile to try (objective function calls are expensive, so making fewer is better) because training a model is time-intensive — some modestly-large SVM problems that I've worked on can take between minutes and hours to complete.

On the other hand, Optunity is a particle swarm implementation to address for the same task. I'm not overwhelmingly familiar with PSO, but it seems like it must be less efficient in the sense of requiring a larger number of trial points, and therefore objective function evaluations, to assess the hyperparameter surface.

Am I missing a key detail that makes PSO preferred to BO in the machine learning context? Or is the choice between the two always inherently contextual for the hyperparameter tuning task?


(1) Shahriari et al, "Taking the Human out of the Loop: A Review of Bayesian Optimizaiton."

Best Answer

As the lead developer of Optunity I'll add my two cents.

We have done extensive benchmarks comparing Optunity with the most popular Bayesian solvers (e.g., hyperopt, SMAC, bayesopt) on real-world problems, and the results indicate that PSO is in fact not less efficient in many practical cases. In our benchmark, which consists of tuning SVM classifiers on various datasets, Optunity is actually more efficient than hyperopt and SMAC, but slightly less efficient than BayesOpt. I would love to share the results here, but I'm going to wait until Optunity is finally published in JMLR (under review for over a year now, so don't hold your breath ...).

As you indicate, increased efficiency is a commonly used selling point for Bayesian optimization, but in practice it only holds water if the assumptions of the underlying surrogate models hold, which is far from trivial. In our experiments, Optunity's very simple PSO solver is often competitive with complex Bayesian approaches in terms of number of function evaluations. Bayesian solvers work very well when provided with good priors, but with an uninformative prior there is virtually no structural benefit over metaheuristic methods like PSO in terms of efficiency.

A big selling point for PSO is the fact it's embarassingly parallel. Bayesian optimization is often hard to parallelize, due to its inherently sequential nature (hyperopt's implementation being the only real exception). Given opportunities to distribute, which is becoming the norm, Optunity quickly takes the lead in wall-clock time to obtain good solutions.

Another key difference between Optunity and most other dedicated hyperparameter optimization libraries is the target audience: Optunity has the simplest interface and is targetted towards non-machine learning experts, whereas most other libraries require some understanding of Bayesian optimization to use effectively (i.e., they are targetted towards specialists).

The reason we made the library is that despite the fact dedicated hyperparameter optimization methods exist, they lack adoption in practice. Most people are still either not tuning at all, doing it manually, or via naive approaches like grid or random search. In our opinion, a key reason for this is the fact that existing libraries prior to developing Optunity were too difficult to use in terms of installation, documentation, API and often limited to a single environment.

Related Question