It's not really about the matrices. It's about how the cost of certain algorithms, data structures, procedures, etc. relate to the size of the matrix and the number of non-zero elements.
For example, if a data structure is designed for storing sparse matrices, it means that it can store a matrix in space proportional to the number of non-zero elements. Simpler data structures for storing matrices (like 2-dimensional arrays) take space proportional to the size of the matrix.
If you're working with a sparse matrix data structure, it's probably because your matrix is large, but sparse, and you rely on the sparseness of the matrix to meet your requirements for the size of the data structure in memory.
Similarly, if an algorithm is designed to work with sparse matrices, then its cost is defined in terms of the number of non-zero elements in the matrix rather than the matrix's size. If you're using an algorithm like that, them again it's probably because your matrix is large, but sparse, and you rely on the sparseness of the matrix to meet your requirements for the speed of the computation.
So, in summary, there is no specific density at which the matrix changes from sparse to dense. When you start finding it useful to use data structures and algorithms that are optimized for sparse matrices, then you start thinking about the costs in terms of the number of non-zero elements, and at that point you are using sparse matrices.
ALSO, a class of matrices (e.g, diagonal, tridiagonal, etc.) is generally called sparse if the number of non-zero elements is proportional to the number of rows or columns instead of rows*columns. This gives sparse matrix algorithms an advantage in computational complexity (big O), meaning that sparse matrix algorithms will always perform better on sufficiently large matrices in that class.
Too long for a comment
python
is probably not the best way to go. Even loading the whole matrix in memory is probably not possible at all in your case.
Most numerical methods rely on multiplication of the matrix times an input vector, so instead of generating the matrix and then multiplying it times a given vector you can create the matrix elements on-the-fly
I've used scalapack
in the past and it is very powerfull
http://www.netlib.org/scalapack/
But it does requiere to know C/Fortran
. However, if you really are out of your confort zone with this languages you can always write a wraper for python
Best Answer
It is not an easy task to compute the rank of a large sparse matrix. This is mainly due to the fact that computing all the (numerically reliable) rank revealing decompositions (QR, SVD) are very costly for such matrices, particularly because it is not easy to control the sparsity of the factors involved in the decomposition unlike, e.g., with the LU factorisation, which is usually more friendly with respect to the sparsity of the factors.
If you had the tridiagonal matrix $T$ from the Lanczos algorithm (LA), it would be quite easy to compute the eigenvalues (e.g., using an iterative diagonalisation based on Jacobi rotations) and hence the rank of the (symmetric) matrix $A$ would be then simply given by the number of non-zero eigenvalues of $T$.
However, I would not recommend you to use this algorithm for several reasons:
First, it works only for symmetric matrices. There is a non-symmetric version as well but it has even worse properties than the symmetric version (which I will mention later).
In order to compute the rank of the symmetric matrix, you would need to run LA "until the end", that is, to get the full factorisation of the matrix $A$ with $T$ having the same dimension as that of $A$. LA is not supposed to work like this in practice. First, there's a severe loss of orthogonality in the vectors stored in $V$ due to rounding errors and thus to have a "reliable" $T$, you would need to do some sort of reorthogonalization and you would need to store the whole basis $V$ (which of course represents a huge memory problem at least). This is not needed in theory, and also is not done in some related methods, like the conjugate gradient method for solving linear systems, but the finite precision arithmetic is not a friend of LA. Also, since an $k\times k$ "unreduced" tridiagonal matrix has the rank at least $k-1$, you would need to capture reliably so called "happy breakdowns" which is almost impossible due to the highly nonlinear nature of LA and its high sensitivity.
Of course I don't say that LA is useless. It is just not suitable at all for this problem.
What I would suggest you is to:
sprank
function in MATLAB.