As Ethan Bolker points out in the comments, you can't make blanket statements about which method to use. You look at each particular problem and use insights from that problem to make a decision.
Newton's method (and in particular, its "hardened" variants that include a line search or trust region) is so popular because it is so often successful when solving black-box optimization problems, even for non-convex problems well outside of the regime where theoretical guarantees exist. BFGS and Gauss-Newton are also very popular variants.
For concrete classes of problems, coming up with efficient and robust quasi-Newton schemes is often an active area of research. For example, here's a recent paper that studies equations that arise from the physics of elastic bodies, in particular: Blended Cured Quasi-Newton for Geometry Optimization.
EDIT: as for mitigation strategies, there are many approaches. Sometimes you can analyze the Hessian and argue that some of the more computationally intensive terms are "negligible" and can be neglected. Or you compute the full Hessian only every so many iterations and perform rank-one updates in between. Etc.
Note also that for many large problems the cost of the linear solve in Newton's method dominates the Hessian construction cost.
Best Answer
In machine learning, the interest in solving function-is-$0$ conditions is for, say, minimizing $f$ by setting $\nabla f=0$. Since this is already a first derivative, Newton's method ends up using the second derivative $\nabla^2 f$, which is very expensive in high dimensions.
The cubic approach you linked looks unfamiliar. I was hoping it'd be Halley's method, but it seems different.
Newton's method isn't considered a form of gradient descent, because GD doesn't choose its step size to approximate the root. Newton's method is quadratically convergent, which is a bit of a double-edged sword; GD prefers a slower but somewhat safer linear convergence.