[Math] Eigenvectors and eigenvalues in iterative methods

eigenvalues-eigenvectorslinear algebranumerical methods

I've been studying many iterative methods, like Jacobi, fixed-point, Newton's and the conjugate gradient methods.

Currently, I'm studying the CG method, but it's not the first time where the eigenvectors (and eigenvalues) of the usual matrix involved are important in determining how the iterative method behaves, like what's the convergence of the method.

In particular, I was reading the book "A first course in numerical methods" (by Greif, Ascher) and at a certain point (pp. 186-7) there's:

If the eigenvalues of A are located in only a few narrow clusters, then the CG method requires only a few iterations to converge. Convergence is slower, though, if the eigenvalues are widely spread.

which makes me wonder why, why if the eigenvalues of $A$ are located in an only a few narrow clusters the CG converges fast?

This is not the first time that I see similar observations where the eigenvalues and eigenvectors of the matrix involved in the iterative method somehow determine the behavior or performance of the iterative method.

My questions, apart from the specific case above, are:

What are the relations between the eigenvectors and eigenvalues of the matrices involved in an iterative method and the behaviour of the iterative method (if this can be generalized)?
What are the general behaviors of iterative methods that we can predict given the eigenvectors or eigenvalues?

If this can't be generalized, either I could ask specific questions, or you could like list a few examples where eigenvectors and eigenvalues influence different iterative methods. Of course, an exhaustive list would be nice for everyone.

Note: I know what are eigenvectors and eigenvalues, even though when I think about them I can't really make sense of why they are useful. You could argue that they are useful like in the example above, but I can't see the relation very well and clearly.

Best Answer

Iterative method can be divided in two big groups:

Stationary methods that they use an fixed point iteration idea
Krylov methods that they are projection methods. (in this question you are interested in these)

See this @Christian_Clason's answer for a more details.

What are the relations between the eigenvectors and eigenvalues of the matrices involved in an iterative method and the behaviour of the iterative method (if this can be generalized)?

What are general behaviours of iterative methods that we can predict given the eigenvectors or eigenvalues?

For the CG the relation with eigenvalues is given by the error estimate that involves the condition number $k= \lambda_{max} / \lambda_{min}$: $$ || x_\star - x_m||_A \leq 2 \left[ \frac{\sqrt{k} - 1}{\sqrt{k} + 1} \right]^m || x_\star - x_0||_A $$ See pag 187 of your book, or [1] pag 215 equation (6.128) here you can find also the proof.

Note:

the use of condition number for the estimate you can consider a "characteristic" of the Krylov methods.
for different algorithm you can find the opportune estimate, see [1]

Considering the case:

If the eigenvalues of A are located in only a few narrow clusters

then also $\lambda_{max}, \lambda_{min}$ are near and you have got a better $k$, the perfect case is when $k=1$. In the realistic case is when $k$ as near as possible to 1. Better is $k$ less iterations you need for the convergence.

Note: This is the scope of the precondition preconditioning technique, replace the original linear system with an equivalent whose spectral properties are better (see pag 187-88 of your book, or dedicate chapters of [1]). I.e. that the eigenvalues of the precondition matrix $P^{-1}A$ are in a cluster to 1.

[1] Saad, Yousef. Iterative methods for sparse linear systems. Siam, 2003.

Related Solutions

[Math] Eigenvalues and eigenvectors of a Toeplitz matrix

The following MATLAB code will find the eigenvalues of the matrix in your question:

A = toeplitz([2 3 1], [2 -8 -24]);
eigvalues = eig(A);

[Math] meaningful distinction between “direct” and “iterative” methods for solving equations

I wouldn't agree that direct techniques always use a "formula". For instance, in solving linear systems of equations "directly", one typically doesn't use a closed-form solution, but instead something like LU or Cholesky decomposition.

Iterative solvers, on the other hand, would be things like Gauss-Seidel iteration, SOR (successive over-relaxation), Krylov subspace methods like conjugate gradients, or multilevel methods.

So what's the big difference? In the case of linear systems, I'd say the following are two of the main differences:

With a direct method, you have to do a certain amount of work, and then you obtain your solution. For instance, if you're doing LU factorization, you can't stop halfway through and say "alright, that's good enough". (Although there is something called incomplete LU, but it's actually used in iterative methods!) To get a solution, you need to run the algorithm to the end. Typically, the solution you get will then be close to the exact one.

With iterative methods, you always update your old guess and get (hopefully) a bit closer to the true solution. This means that you can decide how much work you want to invest depending on how accurate you need your solution. For instance, CG is known to converge to the exact solution in $n$ steps for a matrix of size $n$, and was historically first seen as a direct method because of this. However, after a while people figured out that it works really well if you just stop the iteration much earlier - often you will get a very good approximation after much fewer than $n$ steps. In fact, we can analyze how fast CG converges. The end result is that CG is used as an iterative method for large linear systems today.

The second big difference is that, for a direct method, you generally need to have the entire matrix stored in memory. Not so in iterative methods: here, you typically only need a way to compute the application of the matrix to a vector, i.e., the matrix-vector product. So if you have a very large matrix, but you can relatively quickly compute its application to a vector (for instance, because the matrix is very sparse or has some other kind of structure), you can use this to your advantage.

Best Answer

Related Solutions

[Math] Eigenvalues and eigenvectors of a Toeplitz matrix

[Math] meaningful distinction between “direct” and “iterative” methods for solving equations

Related Question