[Math] meaningful distinction between “direct” and “iterative” methods for solving equations

numerical methods

I'll motivate this question with an example.

The Abel-Ruffini theorem states that there is no general "formula" for the roots of polynomials of degree greater than 4. (Specifically it states that there is no solution that can be expressed in terms of radicals.) This lack of a direct formula means that, in order to compute the roots for e.g. a generic quintic, we have must resort to iterative root-finding algorithms.

The distinction occurs in other problems as well, such as in linear algebra. "Direct" techniques use a "formula", whereas "indirect" techniques iterate until convergence. Same thing for the solutions of ordinary linear differential equations—which have direct "formulas" in terms of complex exponentials—and other systems, which don't.

But as much as I like to pretend I understand that, I simply don't understand the distinction between direct and iterative methods. The way I see it, pretty much every numerical method is iterative—even computing square roots requires an iterative method such as Newton's method in order to converge to the answer with the error bounded by a desired tolerance ϵ. I just don't see why it's such a big deal that there is no "formula" for quintics—even if there were, the computer would still have to use an iterative method like Newton's method until it converged anyway. And I'm not even sure how to make a meaningful definition for the word "formula" to distinguish between the two in the first place.
So where exactly is the distinction and why is it such a big deal?

Best Answer

I wouldn't agree that direct techniques always use a "formula". For instance, in solving linear systems of equations "directly", one typically doesn't use a closed-form solution, but instead something like LU or Cholesky decomposition.

Iterative solvers, on the other hand, would be things like Gauss-Seidel iteration, SOR (successive over-relaxation), Krylov subspace methods like conjugate gradients, or multilevel methods.

So what's the big difference? In the case of linear systems, I'd say the following are two of the main differences:

With a direct method, you have to do a certain amount of work, and then you obtain your solution. For instance, if you're doing LU factorization, you can't stop halfway through and say "alright, that's good enough". (Although there is something called incomplete LU, but it's actually used in iterative methods!) To get a solution, you need to run the algorithm to the end. Typically, the solution you get will then be close to the exact one.

With iterative methods, you always update your old guess and get (hopefully) a bit closer to the true solution. This means that you can decide how much work you want to invest depending on how accurate you need your solution. For instance, CG is known to converge to the exact solution in $n$ steps for a matrix of size $n$, and was historically first seen as a direct method because of this. However, after a while people figured out that it works really well if you just stop the iteration much earlier - often you will get a very good approximation after much fewer than $n$ steps. In fact, we can analyze how fast CG converges. The end result is that CG is used as an iterative method for large linear systems today.

The second big difference is that, for a direct method, you generally need to have the entire matrix stored in memory. Not so in iterative methods: here, you typically only need a way to compute the application of the matrix to a vector, i.e., the matrix-vector product. So if you have a very large matrix, but you can relatively quickly compute its application to a vector (for instance, because the matrix is very sparse or has some other kind of structure), you can use this to your advantage.

Related Question