This is a very basic question about Newton's method for optimization, but I cannot seem to find the answer in any of my searches. If we are using Newton's method (or gradient descent), how do we find a maximum instead of a minimum? Do we just change the sign of the step size to positive instead of negative?
[Math] Basic Question about Newton’s Method for Optimization
optimization
Related Solutions
The quick answer would be, because the Newton method is an higher order method, and thus builds better approximation of your function. But that is not all.
Newton method typically exactly minimizes the second order approximation of a function $f$. That is, iteratively sets $$ x \gets x - \left[ \nabla^2 f(x) \right]^{-1} \nabla f(x). $$
Gradient has access only to first order approximation, and makes update $$ x \gets x - h \nabla f(x), $$ for some step-size $h$.
Practical difference is that Newton method assumes you have much more information available, makes much better updates, and thus converges in less iterations.
If you don't have any further information about your function, and you are able to use Newton method, just use it.
But number of iterations needed is not all you want to know. The update of Newton method scales poorly with problem size. If $x \in \mathbb{R}^d$, then to compute $ \left[ \nabla^2 f(x) \right]^{-1} $ you need $\mathcal{O}(d^3)$ operations. On the other hand, cost of update for gradient descent is linear in $d$.
In many large-scale applications, very often arising in machine learning for example, $d$ is so large (can be a billion) that you are way beyond being able to make even a single Newton update.
I would assume the 'true function' they are referring to is the $L^2$ norm, which they have defined to be $f(\mathbf{x})$.
The Newton method is just a root finding algorithm. I believe in the article you quoted, it is just distinguishing between the context of applying it to a function vs applying it to the derivative of a function. Since the Newton method is just a linear approximation of the original function, it will give the exact answer when applied to the derivative of a quadratic function. In fact if you click on the link given in the second article, the iterative formula they give is identical to the standard Newton method iteration, just applied to $f'$ rather than $f$.
To point 3, it is my understanding that there is only one Newton method, just used in different contexts. In this case since the goal is finding a minimum of your function, you would be doing root finding on $f'$ rather than on $f$.
Best Answer
Yes, that's exactly what you do. You can think of this sign change as causing you to perform gradient ascent instead of gradient descent (in the case of using a gradient method). Alternatively, you can think of flipping the sign in a gradient method as performing gradient descent in $-f$. By finding a minimum of $-f$ you find a maximum of $f$.
Similar reasoning holds for Newton's method (and various other methods as well).