Solve banded linear system with large bandwidth but sparse interior band structure

linear algebramatrix decompositionpartial differential equationssparse matrices

Assume the linear system $Ax = b$, where $A$ is a $N \times N$ banded matrix with lower and upper bandwidth $l$, and $N >> l >> 1$. $A$ has the following structure:

All entries of $A$ are zero, except for the following: the diagonal of $A$, the first super-diagnonal band and the first sub-diagonal band are non-zero; the $(l-2)$nd, $(l-1)$th and the $l$-th super-diagonal and sub-diagonal bands are also nonzero. Therefore, $A$ contains 9 non-zero bands.

The system can be solved using a standard band-matrix $LU$ decomposition, the complexity of the solve then scales as $N\times l^2$ for $l << N$. However, this method does not exploit the fact that all bands between the first off-diagonals and the $(l-2)$nd off-diagonals are identically zero.

Such a system arises in the discretization of second-order, two-dimensional differential equations with mixed derivatives, using a three-point compact stencil at interior nodes and a two-point stencil at boundary nodes.

Besides bandwidth-reducing algorithms, does a faster method exist to solve such a system, that makes use of the sparse band structure described here?

Best Answer

Note that standard banded $LU$ factorization for a matrix of bandwidth $\ell$ has time complexity $O(N \ell^2)$.${}^*$ In your question, you specifically address discretized differential equations.

There has been considerable work on iterative methods for such discretizations. It is a truly vast field, but very popular is a Krylov subspace method preconditioned by, say, algebraic multigrid. For many PDEs, these can provably produce an approximate solution to a specified tolerance in $O(N)$ time.

Let me now address the question you asked. Given an arbitrary banded matrix with bandwidth $\ell$ with only $9$ nonzero diagonals as specified in your answer, does there exist a method faster than banded $LU$ factorization? If the answer to this question is no, then it would be very hard to show this, as you must demonstrate that every method which solves the problem must have a certain time complexity. There is some very sophisticated and interesting work in theoretical computer science which is addressing this problem, which I must admit I don't fully understand which might be able to give very precise "lower bound" statements on how fast problems of this type might be solved.

Here is a specific result which you may or may not be aware of which might temper your expectations about what time complexity you can hope to achieve. Informally, the result might be stated as follows:

For any matrix $A$ fitting your description with $\ell \approx \sqrt{N}$, there is no reordering of the rows and columns of the matrix $A$ such that an $LU$ factorization of the reordered $A$ can be computed in time faster than $O(N^{3/2})$. Moreover, this time complexity is obtained by reordering $A$ according to the nested dissection ordering.

This result is more or less a consequence of the results of this paper. The case $\ell \approx \sqrt{N}$ is important because this corresponds to the discretization of a 2D PDE on a square mesh. The result does not rule out some other algorithm solving the problem in faster than $O(N^{3/2})$ time, but it does rule out a class of algorithms which are "$LU$ factorization-like". My hunch is that $N^{3/2}$ is as good as you can get for this problem without some additional assumption (like $A$ being diagonally dominant, which is often the case for discretized PDEs).

I would look into numerical methods for linear systems arising specifically from discretized PDEs, for which there is additional structure (like perhaps diagonal dominance) which can open up certain algorithmic approaches that won't work on a fully general matrix $A$ with the sparsity pattern you describe.

${}^*$ To see why this must be the case, note that if $A$ is dense and thus $\ell = N-1$, then a complexity of $O(N\ell)$ would mean we could solve every system of linear equations in $O(N^2)$ operations, which has not been shown to be possible. $O(N\ell^2)$ recovers the standard $O(N^3)$ for Gaussian elimination.

Related Solutions

[Math] Fastest way to solve linear system with block symmetric banded/Toeplitz matrix

For each diagonal block $A$, it seems that you have a fixed small distance between your nonzero diagonals. You have a nonzero diagonal, a n nonzero 5th super/subdiagonal and a nonzero 10th super/subdiagonal. The fact that the spacing between the nonzero diagonals is critial.

You will be able to do a symmetric permutation $PAP^T$ of each diagonal block $A$ into a block diagonal matrix with many small blocks on the main diagonal. Here $P$ is a permutation matrix. These small blocks will be banded and dense within the band. I recommend the symmetric reverse Cuthill-McKee algorithm for this kind of problem. There is an implementation in MATLAB (symrcm) and there are free C implementations available online.

At first it may seem astonishing that your diagonal blocks decouple into smaller disjoint blocks. To gain an understanding of this, I recommend that you first consider a 10 by 10 matrix with a nonzero diagonal, a nonzero 2nd superdiagonal and a nonzero 2nd subdiagonal. Applying the permutation $q = (9,7,5,3,1,10,8,6,4,2)$ to the rows and columns will give you two diagonal blocks of dimension 5 which are tridiagonal. The permutation is written in MATLAB format and you should interpret it as follows: put row 9 as row 1, put row 7 as row 2, put row 5 as row 3, etc. and similarly for the columns.

This is not a bad place to start (as it will dramatically simplify your problem and expose massive parallelism and give you small banded systems to solve), but you may want to look into graph reorderings and the elimination game for sparse solvers in general.

I hope this helps.

EDIT: Once you have obtained small banded diagonal blocks, then we can begin to discuss which algorithm is the fastest as it depends very much on the system which you are programming for, but obtaining good data locality is critical on any architecture. It is possible that your blocks will be so small that you can not do vector operations (SIMD) in the usual manner and that you will have to interleave the data representing different blocks, but that is doable.

Inverting a huge sparse banded matrix

To expand on George's answer, if we have a matrix consisting of diagonal matrices $D_{ij}$ like this :

$$M = \left[\begin{array}{ccc} D_{11}&\cdots &D_{1n}\\\vdots&\ddots&\vdots\\D_{n1}&\cdots&D_{nn} \end{array}\right]$$

We can always find a permutation similarity

$$A = PMP^{t}$$

So that $A = \left[\begin{array}{ccc} B_{1}&0 &0\\0&\ddots&0\\0&0&B_{m} \end{array}\right]$

In Gnu Octave or Matlab code this permutation transformation seems to be possible to express as a sparse matrix

P = sparse(((1:n)'+(0:(m-1))*n)(:),((0:(n-1))'*m+(1:m))(:),ones(n*m,1))'

Now inverting $A$ will be a simple case of inverting each of the $B$:s separately.

Best Answer

Related Solutions

[Math] Fastest way to solve linear system with block symmetric banded/Toeplitz matrix

Inverting a huge sparse banded matrix

Related Question