[Math] Robust black box function minimization with extremely expensive cost function

applicationsglobal optimizationoc.optimization-and-control

There is an enormous amount of information about the common applied math problem of minimizing a function.. software packages, hundreds of books, research, etc.
But I still have not found a good reference for the case where the function to be sampled is extremely expensive.

My specific problem is an applied one of computer science, where I have a simulation which has databases with a dozen parameters that affect voxel sizes, cache distribution, tree branching, etc. These numeric parameters don't affect the algorithm correctness, just runtime. I want to minimize runtime.

Thus, I can treat the problem like a black box minimization. My cost function is runtime, which I want to minimize. I don't have derivatives, and I can treat it like a black box.
I have a decent starting point and even rough scales of each parameter. There can be interations and correlations between parameters and even noise in time measurements (luckily small.)

So why not just throw this into a standard least-squares minimization tool, using any package out there? Because my timing samples each take 8 hours to run.. so each data point is precious, and the algorithms I find tend to ignore this cost. A classic Levenberg-Marquand procedure freely "spends" samples and doesn't even remember the full history of each sample taken (instead updating some average statistics).

So my question is to ask for a pointer to iterative function minimization methods which use the minimum number of samples of the function. Ideally it would work where I could pass in a set of already-sampled locations and the value at each location, and the algorithm would spit out a single new location to take the next sample (which may be an exploratory sample, not a guess at a best minimum location, if the algorithm thinks it's worthwhile to test.)

I can likely take hundreds of samples, but only hundreds, and most multidimensional minimization methods expect to take millions.

Currently I am doing doing the minimization manually daily, using my own ad-hoc invention. I have say 40 existing timing samples to my 15-parameter model. I fit all my existing samples to a sum of independent quadratics (making the big initial assumption that each parameter is independent) then look at each of the N*(N-1)/2 ~=100 possible correlation coefficients of the full quadratic matrix. I find the few single matrix entries that when allowed to change from 0.0, give the best quadratic fit to my data, and allow those few entries to be their best least-squares fit. I also give locations with small (faster) values higher weight in the fit (a bit ad hoc, but useful to throw out behavior distant from the minimum) Once I have this matrix, I manually look at graphs in each of the major eigenvalue directions and eyeball locations which seem to need better sampling. I recombine all these guesses back into a new sample location. Each day, I tend to generate 4 new points, set up a run to test them over the next day, and repeat the whole thing again after the computation is done. Weekends get 10 point batches!

Thanks for any ideas! This question likely doesn't have a perfect "best" answer but I'm stuck at what strategy would work best when the evaluation cost is so huge.

Best Answer

I've read the paper, but never used the approach.

"Efficient Global Optimization of Expensive Black-Box Functions" by: Donald R. Jones, Matthias Schonlau, William J. Welch

PDF available from one of the page of an author.

Related Question