Numerical Methods – Intuition Behind Matrix Splitting Methods (Jacobi, Gauss-Seidel)

numerical linear algebranumerical methods

Descent Methods, like Gradient and Conjugate Gradient ones, have a nice geometric interpretation and I really love them. What about Jacobi, Gauss-Seidel or other matrix splitting methods? I can't see any intuition behind the formulas.

A posteriori I know that these methods works because of Banach Fixed Point theorem but I still need a more stimulant picture.

Any ideas?

Best Answer

I'm not really sure what do you mean by intuition behind the methods, but all classical iterations for solving $Ax=b$ are based on splitting $$ A=M-N, $$ where

  • $M$ is nonsingular and easy to invert,
  • $M\approx A$ in some sense.

The equation $Ax=b$ is then equivalent to $$Mx=Nx+b=Mx+(b-Ax),$$ which "naturally" defines a fixed-point iteration $$ x_k=M^{-1}Nx_{k-1}+M^{-1}b=x_{k-1}+M^{-1}(b-Ax_{k-1}). $$ Note that the second expression is more suitable in practice as it does not involve the "residual" matrix $N=M-A$.

There are many ways how to choose $M$ (assume $A=D-L-U$ is a splitting of $A$ to diagonal and lower and upper triangular parts):

  • $M=\alpha I$: Richardson method,
  • $M=D$: Jacobi method,
  • $M=D+L$ or $M=D+U$: forward and backward Gauss-Seidel method,
  • $M=(D+L)D^{-1}(D+U)$: symmetric Gauss-Siedel method,
  • etc. like SOR, SSOR, HSS, block splittings; you can also realize $M$, e.g., by incomplete factorizations or approximate inverses. There are literally tons of possibilities.

Splitting methods have a property that the errors are transformed in the same way independently of the iteration: $$ x-x_k=(I-M^{-1}A)(x-x_{k-1})=(I-M^{-1}A)^k(x-x_0). $$ From the spectral point of view, this has an advantage that (in contrast with CG or steepest descent) the spectral components of the initial error are damped with (asymptotically) the same rate from iteration to iteration. This has an important application, e.g., in multigrid methods, where methods like Jacobi and Gauss-Seidel are often used as smoothers since they efficiently damp (for PDE-related problems) the oscilatory error components associated with large eigenvalues.

Another particular motivation for Jacobi and Gauss-Seidel methods is that they zero out certain components of the residual vector $b-Ax$. Given an approximation $x_k$, the $i$th component of $x_{k+1}$ of the Jacobi iterate is computed simply by solving the $i$th equation of $Ax=b$: $$\tag{$*$} (x_{k+1})_i=\frac{1}{a_{ii}}\left(b_i-\sum_{j\neq i}a_{ij}(x_k)_j\right). $$ If $z_i=[(x_k)_1,\ldots,(x_k)_{i-1},(x_{k+1})_i,(x_k)_{i+1},\ldots,(x_k)_n]^T$, we have $(b-Az_i)_i=0$. If the Jacobi update is executed in a loop for $i=1,\ldots,n$, the components $(x_{k+1})_j$, $j=1,\ldots,i-1$, can be already used on the right-hand side of ($*$), which leads to the update $$\tag{$**$} (x_{k+1})_i=\frac{1}{a_{ii}}\left(b_i-\sum_{j<i}a_{ij}(x_{k+1})_j-\sum_{j>i}a_{ij}(x_k)_j\right). $$ This leads to the (forward) Gauss-Seidel iteration. Switching the loop order and the indices $k$ and $k+1$ on the right-hand side in ($**$) gives the backward Gauss-Seidel iteration. Again, we can interpret it as zeroing out a certain component of the residual vector.