Solved – What are good alternatives to grid search

bayesian optimizationoptimizationpython

What are good alternatives to grid search? By this I mean some techniques/packages for Automated Machine Learning that can optimize model parameters in 'auto' mode.

First question is what techniques can be applied to solved this task: for example genetic algorithm, bayesian optimization, etc?

Second question is how well it work in practice and what packages can be used for example auto-sklearn, etc?

Best Answer

One nice option, which is at the same time very easy to implement is Random Search for Hyper-Parameter Optimization. It is also implemented in scikit-learn (see section 3.2.2).

The idea is to have an algorithm which yields with high probability an optimal result, but being computationally very efficient (compared to other brute-force or more naive approaches like grid search). How are they able to prune the search space efficiently?.

The basic idea is that not all parameters are equally important to tune in order to get good results (i.e. the optimal solution lies in a low-dimensional submanifold of the search space), so we should see how to focus on that region of interest.

What they do is to model the probability distribution of the hyperparameter space which reflects on the idea that models yielding good results are concentrated around the true value. Now, for any probability distribution with finite maximum the following holds. Take the region around the maximum which accounts for 5% of the total probability. If you take n samples at random. Each one of them has a 0.05 chance of falling within that interval. With probability $(1-0.05)^n$ they all will miss that area.

Now they ask: how many samples n do I need to take so that with high probability (say 95%), at least one of them falls in that region: $$ 1-(1-0.05)^n \ge 0.95 $$ which yields $n \ge 60$. This algorithm has been used with success in deep learning, where the number of parameters if huge.

Best Answer

Related Solutions

Solved – Hyper-parameter optimization via random search

Solved – Common values used for hyperparameter grid search

Related Question