Newton method and machine learning

newton raphson

There is some debate about why Newton method is not widely used in machine learning. Instead, people tend to use gradient descent.

Best Answer

In machine learning, the interest in solving function-is-$0$ conditions is for, say, minimizing $f$ by setting $\nabla f=0$. Since this is already a first derivative, Newton's method ends up using the second derivative $\nabla^2 f$, which is very expensive in high dimensions.

The cubic approach you linked looks unfamiliar. I was hoping it'd be Halley's method, but it seems different.

Newton's method isn't considered a form of gradient descent, because GD doesn't choose its step size to approximate the root. Newton's method is quadratically convergent, which is a bit of a double-edged sword; GD prefers a slower but somewhat safer linear convergence.

Related Question