[Math] Rank computation of large matrices

eigenvalues-eigenvectorsmatrices

I have to do rank computation of large (million by million) matrices (I don't wish to compute eigenvalues or eigenvectors, just the rank of the matrices). The matrices are sparse.
I have been suggested to use the Lanczos algorithm. I read the algorithm from Wikipedia.

The Lanczos algorithm gives me a tridiagonal and symmetric matrix $T=V^*AV$ (where $A$ is the input matrix). But it is not written how to compute the rank of the matrix.

I want to ask whether if I run the tridiagonal matrix algorithm on $T$ and find its rank, will it be equal to the rank of $A$? If not please suggest me another simple algorithm/method to do that.

Thanks in advance.

P.S. I am a computer engineering student. I have very limited knowledge about matrices and do not understand complex notations (sorry for that).

Best Answer

It is not an easy task to compute the rank of a large sparse matrix. This is mainly due to the fact that computing all the (numerically reliable) rank revealing decompositions (QR, SVD) are very costly for such matrices, particularly because it is not easy to control the sparsity of the factors involved in the decomposition unlike, e.g., with the LU factorisation, which is usually more friendly with respect to the sparsity of the factors.

If you had the tridiagonal matrix $T$ from the Lanczos algorithm (LA), it would be quite easy to compute the eigenvalues (e.g., using an iterative diagonalisation based on Jacobi rotations) and hence the rank of the (symmetric) matrix $A$ would be then simply given by the number of non-zero eigenvalues of $T$.

However, I would not recommend you to use this algorithm for several reasons:

  • First, it works only for symmetric matrices. There is a non-symmetric version as well but it has even worse properties than the symmetric version (which I will mention later).

  • In order to compute the rank of the symmetric matrix, you would need to run LA "until the end", that is, to get the full factorisation of the matrix $A$ with $T$ having the same dimension as that of $A$. LA is not supposed to work like this in practice. First, there's a severe loss of orthogonality in the vectors stored in $V$ due to rounding errors and thus to have a "reliable" $T$, you would need to do some sort of reorthogonalization and you would need to store the whole basis $V$ (which of course represents a huge memory problem at least). This is not needed in theory, and also is not done in some related methods, like the conjugate gradient method for solving linear systems, but the finite precision arithmetic is not a friend of LA. Also, since an $k\times k$ "unreduced" tridiagonal matrix has the rank at least $k-1$, you would need to capture reliably so called "happy breakdowns" which is almost impossible due to the highly nonlinear nature of LA and its high sensitivity.

Of course I don't say that LA is useless. It is just not suitable at all for this problem.

What I would suggest you is to:

  • Use the structural rank instead which is obtained by finding a perfect matching in the graph associated with $A$. It provides an upper bound on the rank of $A$ and both structural rank is equal to the (ordinary) rank with the probability 1. However, this approach fails to detect, e.g., the rank deficiency in matrices associated with PDE's with Neumann boundary conditions. This is implemented in the sprank function in MATLAB.
  • You can also have a look on the SPQR package (also MATLAB) in the SparseSuite library. There is a function for computing the nullspace of a sparse matrix (the rank can be hence deduced). It is based on sparse orthogonalisation (and hence can be quite costly).
  • I also remember one presentation on a workshop related to a certain sparse direct solver which might provide more hints.
Related Question