Optimization taking into account the shape of the optimization surface

optimizationr

I want to find the maximum of my multidimensional optimization surface, but I want to find not just the local maximum (highest), but the maximum that seems to me the most stable from my point of view. He should be not only tall but also wide

I will explain in the picture with one-dimensional space

The picture shows two maximums, red is the global maximum but I would not like to take it, it looks more like an outlier. I would like to take the green maximum as it is more suitable for me

My question is: how can this be done in R or is there a ready-made package that does this
UPD=======

This should be the maximum point around which also gives good results in terms of the fitness function.

the wider the area from these points, the better
the higher the values of the fitness function in this area, the better

UPD=======

fitness function probably stochastic

UPD2========

I am trying to find neural network weights for very unstable data with my fitness function..
Data transformations, cross validation doesn't work for me.
I assume that the model will work if I find a fairly stable maximum when optimizing the weights.
I assume that if the maximum is wide, then the model will work better with unstable data.
The picture is just an example (I have no idea what the optimization surface looks like)

The bottom line is that I do not need global maximums, I need a wide maximum for the stable operation of the neural network

Best Answer

I would not like to take it, it looks more like an outlier

Several algorithms for outlier detection and removal could work.

In the image it seems that if you use a simple smoothening of the curve then the weight of small peaks becomes less and this would already do the job.

How to smoothen or detect outliers for best performance in you application, for instance how to determine by which degree you smoothen, depends on what you exactly want to achieve and what this maximum is supposed to do.

What do the peaks/outliers mean, how do they arrise, and why do you not wish to take them into account?

Possibly using some more advanced filter might be better than just a simple moving average. The green curve that you got there in the picture also seems to do the job already.

From the comments it seems like this problem is about a situation where the points on the surface are not already sampled for the entire space (like in a grid search or parameter sweep). So this relates to some optimization problem that follows a path.

But in this case you can still make use of outlier detection and smoothening.

Namely, you can incorporate the expressions that do the smoothening inside your cost function. For instance, instead of sampling the cost function in a single point you could also sample points in the neighbourhood and take an average.

In this case you probably want to use some smoothening that doesn't use too much data points and uses local information. I imagine you could use some average or possibly a small kernel to blur the values.

Several optimisers are already doing this sampling in multiple points, in order to find the derivative of the surface (the method SANN* in the package optim searches for an optimum by sampling several points and applying a Gauss-Markov kernel filter). Possibly you could tune them to obtain a bit more extra points and smoothen more strongly at the same time. You can do this also yourself by providing functions for the value and derivative of the cost function that apply the smoothening and use those with an optimiser.

Some packages seem to exist that are dealing with this problem. For instance tgp has some version of optim that works with noisy functions.

*Bélisle, Claude JP. "Convergence theorems for a class of simulated annealing algorithms on ℝ d." Journal of Applied Probability 29.4 (1992): 885-895.

Best Answer

Related Solutions

Solved – Gradient Based Learning Algorithms vs Global Optimization Learning Algorithms for Neural Networks

Optimization of real values and binary, what is the difference

Related Question