Solved – Defining gradient function argument in optim function-R

optimizationr

I want to find minimum of a function according to some parameters using optim function.

I wonder; does defining a gradient function for the objective function, just speeds up the optimization?

Or does defining it give better results. For example in some situations the optimization algoritm produces unreliable optimum points. Does defining a gradient function fix such results?

I will be very glad for any help. Thanks a lot.

Best Answer

In R, the default method used in optim is Nelder-Mead, which does not use gradients for optimization. As such, it's pretty slow. ?optim states that this method is robust...but I'd say that's false advertising; it can often return a sub-optimal solution for easy problems with no warnings.

Because this method does not use gradients, supplying the gradient function with the default setting of Nelder-Mead will not actually change the procedure at all.

On the other hand, if you use the quasi-Newton methods, (BFGS or L-BFGS-B) or conjugate gradient, these methods do require evaluation of the gradient during optimization. If these are not supplied in the gradient function, they are estimated numerically, i.e.

$f'(x) \approx \frac{f(x+h) - f(x-h)}{2h}$

for some small $h$.

If the function you are evaluating is relatively cheap to evaluate, or the number of parameters is not too high, this is typically fine to use and you can save yourself the time of writing out the full gradient.

On the other hand, for many problems with large numbers of parameters, calculating the full vector of gradients numerically can be prohibitively slow. Remember, if you have $k$ parameters, the above calculation needs to be calculated $k$ times. Also, for problems with large second order derivatives, this numerical approximation may be unstable, so supplying a function that analytically evaluates the derivative may stabilize the algorithm. But typically, speed in computing a single gradient is the motivation for supplying the analytic function for the gradient.

Related Question