I looked quickly at the code you were playing with in the original post.
The biggest problem was arguably that no arrays were preallocated. ALWAYS preallocate arrays that will be grown in size dynamically. This is perhaps the biggest reason for code (written by novices) running slowly.
Failing to preallocate an array that is then grown iteratively causes a quadratic time penalty. As the array size grows, the time required to dynamically re-allocate memory, then copying the entire array contents grows quadratically.
Of course, if you never grow any arrays or vectors, then there is no need to preallocate. That is often a consequence of "vectorization". But there are many ways to vectorize code. No single way exists. Often, vectorization just means re-thinking the code flow, re-thinking the basic algorithm. Very often vectorization means you need to trade off memory for external explicit loops. But the use of great gobs of memory can be expensive. And creating those large arrays, then working with them requires internal, implicitly generated loops. You will always have loops. But implicitly generated loops in compiled code tend to be much faster than explicit MATLAB loops.
Where possible in vectorized code, use either bsxfun or implicit scalar expansion. They can reduce the memory load.
You point out that the vectorization gain is not always uniform, or even consistent. That sometimes can be explained by too heavy use of memory. Thus if your temporary arrays grow so large that MATLAB has trouble finding the room, your computer will start swapping, using slow memory in exchange of fast memory.
Next, don't forget that different parts of your algorithm will have different time scalings as the problem size grows. So if one piece of the code is O(n^3) in time complexity, another is O(n), and another O(exp(n)), then the O(exp(n)) part will begin to dominate the computation as the problem grows in size, even if the constant out front was very small. So small computations may be dominated by the O(n) piece, but eventually, that O(exp(n)) term will grow until the problem becomes computationally impossible to solve.
Even vectorized code can be terribly slow, if the vectorization is terribly done. For example, suppose you want to scale every column of a matrix, using a vector of constants?
N = 10000;
A = rand(N);
K = 1:N;
So I want to multiply the j'th column by K(j).
You might use a double loop, thus effectively crap code written as if it was written in some low level language. At least loops are pretty well optimized in MATLAB these days. But still that will be slow as hell.
tic
B = A;
for I = 1:N
for J = 1:N
B(I,J) = A(I,J)*K(J);
end
end
toc
Elapsed time is 4.396439 seconds.
So not very fast. Had I not preallocated B, I would still be here days later waiting though.
We could do this using a matrix multiply, with a diagonal matrix. No need to preallocate. but way slower!
tic
B = A*diag(K);
toc
Elapsed time is 48.600078 seconds.
That is because a matrix*matrix multiply os an O(N^3) operation. But we only had to do N*2 multiplies! Most of those multiplies were multiplies by zero, then adding up a lot of mainly zeros.
Making K into a sparse diagonal matrix makes that part much faster.
tic
B = A*spdiags(K(:),0,N,N);
toc
Elapsed time is 0.641262 seconds.
But even sparse matrix multiplies are not perfect. A simple repmat is better here.
tic
B = A.*repmat(K,N,1);
toc
Elapsed time is 0.501364 seconds.
Of course, this is why bsxfun was invented, to replace those repmats.
tic
B = bsxfun(@times,A,K);
toc
Elapsed time is 0.289946 seconds.
If you have a new release of MATLAB, then implicit array expansion helps, because bsxfun brings some overhead.
tic
B = A.*K;
toc
Elapsed time is 0.207587 seconds.
And, of course, had N been larger or smaller, then all of these basic variations might change their times, some getting relatively larger or slower.
In the end, there is no single best way to vectorize ANY block of code. The best way will depend on the problem size. It will sometimes depend on the data itself. It will depend on your skill at writing code in MATLAB, and how well you understand how MATLAB uses memory. It will depend on your knowledge of what functions are available to you. (For example, have you blinked, and missed that some new capability was recently introduced into MATLAB?)
Best Answer