MATLAB: Sparse Matrix build efficiency

inversionskylinesparsetime efficiency

Hello everyone.
I want to build a nxn matrix A as sparse for efficiency( time of the inversion(A\)). i)I would like to ask is it ok if I define it with the command A=sparse([..],[..],[..],n,n) with one or a few non zero elements at the beginning and then add the rest of the elements with commands like the following: A(..:..,..:..)=..; or will it cause a problem with time efficiency like when we change the size of a regular matrix or vector inside a loop and we get the warning "consider preallocating for speed"?
I can't directly compare it in terms of time efficiency with the case of building the three necessary vectors [..] from the beginning and use only the command sparse, because I have to change many things in my code to do that.
ii) Is it nessesary or possible in MATLAB to store a matrix in skyline format along with the corresponding vector of the positions of the diagonal elements?
Thank you in advance

Best Answer

In general, DON'T DO IT! That is, don't build a sparse matrix one element at a time.
Even if you use spalloc, the matrix will be inefficient to build.
For example, I'll build a sparse matrix one element at a time below.
N = 100000;
M = spalloc(1000,1000,N);
tic
k = N + 1;
for i = 1:N
k = k - 1;
M(k) = i;
end
toc
Elapsed time is 6.069033 seconds.
So I stuffed the matrix to its allocated capacity, at 10% non-zeros. I did it by inserting elements from the back end first, so every time I put a new element in, I believe the sparse insert forces the elements to be re-shuffled in memory.
Next, I'll stuff them in the same order they will actually be stored in memory. So no reshuffling is necessary.
N = 100000;
M = spalloc(1000,1000,N);
tic
k = 0;
for i = 1:N
k = k + 1;
M(k) = i;
end
toc
Elapsed time is 0.196214 seconds.
This time it was MUCH faster. The problem is, if you do the stuffing in random order, it will almost always be true that SOME amount of reshuffling must be done. So most of the time, the time will be like the first call.
N = 100000;
ind = randperm(1e6,N);
M = spalloc(1000,1000,N);
tic
for i = 1:N
k = ind(i);
M(k) = i;
end
toc
Elapsed time is 2.980270 seconds.
So it took 6 seconds for the worst case. and half that for the random sequence of elements. But if I built it in the best possible order, it took 0.2 seconds.
Again, the point is, DON'T DO IT!
Instead, build the array using sparse, in ONE single call. This will force you to create a list of non-zero elements upfront. Preallocate your matrices. Never grow them dynamically.
tic
[rind,cind] = ind2sub([1000,1000],ind);
M = sparse(rind,cind,1:N,1000,1000);
toc
Elapsed time is 0.036842 seconds.
You should notice that this call took significantly less time than the best possible case to allocate that sparse matrix before (using a loop).
If you are going to use sparse matrices, then learn to use the sparse matrix tools. These tools are of huge benefit, but if you use them poorly, you will still get poor performance.
Ok, yes, you THINK you cannot use sparse. In fact, you are simply wrong in that. You can build the elements in advance, storing them as a list of row and column indices, plus a list of values. If you know in advance how many elements you will have, just preallocate those vectors as FULL vectors, stuffing them one at a time as you compute the elements. When the last element is created, call sparse, ONCE. ONE TIME.
If you don't know how many elements you will have at the end, I provide code to build arrays incrementally here, found on the file exchange for download.