Solved – What are some of the disavantage of bayesian hyper parameter optimization

bayesian optimizationhyperparametermachine learningoptimization

I am fairly new to machine learning and statistics but I was wondering why bayesian optimization is not referred more often online when learning machine learning to optimize your algorithm hyperparameters? For example using a framework like this one: https://github.com/fmfn/BayesianOptimization

Does bayesian optimization of your hyperparameters have any limitation or major disadvantage over techniques like grid search or random search?

Best Answer

  1. results are sensitive to parameters of the surrogate model, which are typically fixed at some value; this underestimates uncertainty; or else you have to be fully Bayesian and marginalize over hyper parameter distributions, which can be expensive and unwieldy.
  2. it takes a dozen or so samples to get a good surrogate surface in 2 or 3 dimensions of search space; increasing dimensionality of the search space requires yet more samples
  3. Bayesian optimization itself depends on an optimizer to search the surrogate surface, which has its own costs -- this problem is (hopefully) cheaper to evaluate than the original problem, but it is still a non-convex box-constrained optimization problem (i.e., difficult!)
  4. estimating the BO model itself has costs

To state it another way, BO is an attempt to keep the number of function evaluations to a minimum, and get the most "bang for the buck" from each evaluation. This is important if you're conducting destructive tests, or just doing a simulation that takes an obscene amount of time to execute. But in all but the most expensive cases, apply pure random search and call it a day! (Or LIPO if your problem is amenable to its assumptions.) It can save you a number of headaches, such as optimizing your Bayesian Optimization program.