[Math] What’s a good way to provide an initial guess to “minimize $\sin(x)-e^{\cos(x^{2})} – x$ on $[0,1]$ with scipy.optimize.minimize”

numerical optimizationoptimization

I want to minimize $h(x) = \sin(x)-e^{\cos(x^{2})} – x$ on $[0,1]$ as an exercise in writing generic solvers.

Here's what I have done in Python.

I defined my function.

>>> def h(x):
    return numpy.sin(x)-numpy.exp(numpy.cos(x**2)) - x

After importing what's needed, I ran the minimize function with an initial guess of $0.5$

>>> scipy.optimize.minimize(h,[0.5,],bounds=((0,1),))

I obtained the following output

      fun: array([-2.71828183])
 hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 0.])
  message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
     nfev: 4
      nit: 1
   status: 0
  success: True
        x: array([ 0.])

The minimum looks to be $e$ at $x = 0$. Which seems believable.

Curiously, changing the initial guess to $0.48$ from $0.50$, I obtained the following:

>>> scipy.optimize.minimize(h,[0.48,],bounds=((0,1),))
      fun: array([-2.71831419])
 hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
      jac: array([ -3.81916720e-06])
  message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
     nfev: 24
      nit: 3
   status: 0
  success: True
        x: array([ 0.09182462])

Indeed, tossing my function into Wolfram Alpha gives a minimum of roughly $-2.171831$ at $x \approx 0.0919095$.

I'm curious to know if there are some good ways to provide an initial guess when one is blind to where the solution ought to be. I chose an initial guess of $x = 0.5$ because it felt "natural" to choose the midpoint of the domain.

When I provided an initial guess of $x = 0$, I got …

>>> scipy.optimize.minimize(h,[0],bounds=((0,1),))
      fun: array([-2.71828183])
 hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64<
      jac: array([ 0.])
  message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
     nfev: 2
      nit: 0
   status: 0
  success: True
        x: array([ 0.])

I'm back to having a minimum at $x = 0$ …

And more hack work with the initial guesses: for an initial guess of $x = 0.01$, scipy.optimize.minimize tells me the minimum is at $x = 0.01004454$ with a value of $-2.71828198$. However, an initial guess of $x = 0.02$ gives $(0.09188086,-2.71831419)$.

Of course, I can graph the function, but I still may dumbly give an initial guess of $x = 0$. image1

Zooming in to $0 \leq x \leq 0.1$ I see this …
enter image description here

Anyway, the question is: what's a good way to provide an initial guess without implicitly solving the problem? I know that optimization algorithms can get stuck on local extrema, but I thought for sure in the case of $h(x)$ there shouldn't be an issue. Changing tolerance and maximum number of iterations didn't help either.

As two aside points: I was a little surprised by the varying results from scipy.optimize.minimize. And for anyone curious, this all came about because I was putting together a genetic algorithm for a course I'm teaching. When I asked it to minimize $h$, it found $x = 0.0946478$, $h(x) = -2.71831401$ which agrees to six places with Wolfram Alpha's solution as well as scipy.optimize.minimize's solution after I gave it a little help. And that got me down this hole.

Best Answer

The problem: multiple local optima

The derivative of your function reveals two local minima on the [0,1] interval, which correspond to the two solutions you have found. Local optimization methods will always find either of the two, depending on your initial guess x0. These methods will not attempt to find the best solution among multiple local minima, as global minimization methods do. Unfortunately, scipy.optimize.minimize deals exclusively with local optimization and does not implement any global method.

Solution: global optimization via multi-starting local methods

One way to find the global minimum is multi-start local minimization methods. Namely, you choose multiple starting points randomly and return the smallest local minimum that you find.

def multistart_minimize(h, n_trials=100):
    """Multi-start local optimization of a scalar function h."""
    best = None
    for _ in range(n_trials):
        now = scipy.optimize.minimize(h, np.random.rand(), bounds=((0.0,1.0),))
        if now.success and (best is None or best.fun > now.fun):
            best = now
    return best

Other ways?

You might find multi-start local optimization unsatisfactory because the outcome is random; indeed, some executions might fail to find the global optimum. And increasing the number of trials to counteract this issue is somewhat brutal with respect to computation time.

Literature on global optimization knows a number of deterministic algorithms without the problem of random results. It also knows other randomized algorithms that might work better than multi-start for one use case or the other.

My personal take on optimization is: it might be worth the try to approximate the function that you actually want to minimize with a convex surrogate function that has a similar solution. Convex functions only have one global optimum, which local optimization methods can find very efficiently.

Related Question