Solved – Why logarithmic scale for hyper-parameter optimization

hyperparametermachine learningoptimization

I'm using random search for hyper-parameter optimization of a machine learning pipeline. For example, for the C and gamma parameter it is recommended to use logarithmically spaced values. Why should I use such values? For example, if I use logarithmic spaced values from $2^{-5}$ to $2^{15}$, then there will be many more values near to $2^{-5}$ (i.e. near zero) than near to $2^{15}$.

Best Answer

... because logarithmic scale enables us to search a bigger space quickly. In your SVM example, we do not know the range for the hyper-parameter. So, a quicker way is trying dramatically different values, say, 1, 10, 100, 1000, which come from a logarithmic scale.

In addition, I think log scale search is the first step. Suppose that we found C=10 is better than C=1 or C=100; then we can focus on that scale to try a better value.

Another reason is for "regularization" parameters, such as C in svm. It is not too sensitive. In other words, we may not find too much difference with 10 or 15, or 20, but results would be very different from 10 to 1000. That is why we start with log search.