Solved – Hyperparameter optimization with random search

hyperparametermachine learningoptimization

I would like to do a random search for hyperparameter optimization. The procedure can be found in link. One possibility is to define a fine grid and take random combinations. A better approach would be to define a distribution for each parameter.

I'm thinking about optimizing SVM with parameters C and gamma (RBF kernel) and also k-nearest neighbours. Of course C and gamma are continuous while k is discrete.

What distributions can be choosen for C, gamma and k?

Second, let's assume a random search optimization over parameters which have to sum to one. How could one incorporate this constraint into the search?

Best Answer

What distributions can be choosen for C, gamma and k?

To reproduce results from other methods, define a box and sample uniformly in the box. This will parallel the procedure of grid search, or any other tuning method, since each point is equally likely a priori.

But if you want some distributions more informative than these, then you'll have to work that out for the problem at hand because that is inherently a context-dependent question: some problems have larger/smaller $\gamma$ and $C$ than others, which is why we tune hyper-parameters in the first place.

If you decide to make this a fully Bayesian problem with informative probabilities over hyper-parameters, embedding the problem as a logistic regression can create a direct path to probability models.

Second, let's assume a random search optimization over parameters which have to sum to one. How could one incorporate this constraint into the search?

Use a stick-breaking process. You start with a unit interval, and pick a point in the interval according to a probabiltiy distribution over the unit interval. Then you iterate $k-1$ times the process for the interval "to the right" (or left) of the chosen point. At the end, you'll have $k$ value which sum to 1.

You could also review the stan documentation pertaining to sampling of simplex random variables for an alternative presentation of the concept.