Newton method and machine learning

newton raphson

There is some debate about why Newton method is not widely used in machine learning. Instead, people tend to use gradient descent.

Some people claim that Newton method is not used because it involves the second derivative. How so? Indirectly? Why? Doesn't Newton method neglect the second derivative?
Is there a name for Newton's method with cubic convergence?
Can we claim that Newton's method is a form of gradient descent?

Best Answer

In machine learning, the interest in solving function-is-$0$ conditions is for, say, minimizing $f$ by setting $\nabla f=0$. Since this is already a first derivative, Newton's method ends up using the second derivative $\nabla^2 f$, which is very expensive in high dimensions.

The cubic approach you linked looks unfamiliar. I was hoping it'd be Halley's method, but it seems different.

Newton's method isn't considered a form of gradient descent, because GD doesn't choose its step size to approximate the root. Newton's method is quadratically convergent, which is a bit of a double-edged sword; GD prefers a slower but somewhat safer linear convergence.

Best Answer

Related Solutions

Is Newton’s Method an smart way to solve an immense system of nonlinear multi-variable algebraic equations

Related Question