MATLAB: Improving code performance by compiling

MATLABmatlab coderMATLAB Compiler

Hello,

I'm coding a program where runtime is relevant, so I'm looking for ways to optimize performance. From what I've read here

https://de.mathworks.com/matlabcentral/answers/223937-should-i-use-matlab-compiler-sdk-or-matlab-coder-to-integrate-my-matlab-applications-with-c-c

the compiler / compiler SDK (don't really know the difference between the two) can create standalone apps that support most features – including graphics – but does not speed up the code since it is not compiled. The coder on the other hand can improve runtime, but does not support graphics (which I need). So in my case the only way to make use of compilation to speed up the program would be to put code into functions wherever possible and then compile those into mex files. Is that right?

Best Answer

Optimization of code is done in different steps:

Write the code as clean and clear as possible. Do not start with a pre-mature optimization, because this is too prone to bugs.
Prove that the code is working correctly using unit- or integration tests.
Then you have a start point to compare the results with the improved versions.
Use the profiler to find the bottlenecks. It is not worth to optimize a piece of code, which needs 1% of the processing time only.
In many situations investing some brain power can accelerate the code massively: Process matrices columnwise instead of rowwise, move repeated code out of loops, avoid creating variables dynamically by EVAL, or LOAD without storing the output in a variable. Maths can be useful also e.g. by reducing the number of expensive EXP or POWER functions.
Rewriting the bottlenecks as C-Mex functions can be very efficient, but if e.g. the amin work is spend in linear algebra routines, Matlab uses highly optimized libraries already.
If rewriting the code as C-Mex is to expensive, try the Coder. This converts the code automatically but with some overhead compared to a manually written C-Mex function.
If graphics are the bottleneck, there is no solution. Matlab 2009a was much faster for a lot of tasks, and the 20 year old Matlab 6.5 beat them all, because it did not use Java for the rendering. But of course the ancient Matlab versions have many drawbacks also - if you want a box around a diagram, you have to rotate the diagram by 0.0001 degreem if OpenGL is used as renderer...

After the optimization is ready, compare the results with the initial version. Compiled functions and even C-Mex functions need not be compatible with future versions, so care for keeping the original not optimized M-files.

Compiling can accelerate your code by a factor 2, with some luck. Exploiting the underlying maths and improving the MATLAB code can give you a fctor of 100. You find some examples in the forum with 200 and 1000 times faster code in pure Matlab. So maybe it is worth to share the code of the bottlenecks of your code.

Related Solutions

MATLAB: What are the best practice in mex

Write efficient C-code is definitively a very important and interesting question, which should be answered in a forum about C.

Some good programming methods concern C as well as Matlab:

In nested loops, move the ones with more iterations to the inside. This reduces the overhead for starting loops.
Access memory in contiguos blocks: While X(:, 1) can be copied efficiently because Matlab stores the elements in column order, reading or writing X(1, :) is much more expensive.
Avoid repeated calculations. Store them in a temporary variable instead.
Divide large problems into chunks, which match in the processor cache.
Use optimized libraries for linear algebra. Programming e.g. a matrix multiplication is a really bad idea.
Program clean and clear at first. Debugging is more important than a runtime optimization. Optimizing the code is the last step of the programming process.
A well design is the basement of a well implementation. So start a large projekt (I guess that 2000 lines of code are "large" already) at least with pencil and paper, not in the editor. Inserting features afterwards will usually lead to spaghetti-code.

MATLAB: Improving the performance of vectorized code possible

I looked quickly at the code you were playing with in the original post.

The biggest problem was arguably that no arrays were preallocated. ALWAYS preallocate arrays that will be grown in size dynamically. This is perhaps the biggest reason for code (written by novices) running slowly.

Failing to preallocate an array that is then grown iteratively causes a quadratic time penalty. As the array size grows, the time required to dynamically re-allocate memory, then copying the entire array contents grows quadratically.

Of course, if you never grow any arrays or vectors, then there is no need to preallocate. That is often a consequence of "vectorization". But there are many ways to vectorize code. No single way exists. Often, vectorization just means re-thinking the code flow, re-thinking the basic algorithm. Very often vectorization means you need to trade off memory for external explicit loops. But the use of great gobs of memory can be expensive. And creating those large arrays, then working with them requires internal, implicitly generated loops. You will always have loops. But implicitly generated loops in compiled code tend to be much faster than explicit MATLAB loops.

Where possible in vectorized code, use either bsxfun or implicit scalar expansion. They can reduce the memory load.

You point out that the vectorization gain is not always uniform, or even consistent. That sometimes can be explained by too heavy use of memory. Thus if your temporary arrays grow so large that MATLAB has trouble finding the room, your computer will start swapping, using slow memory in exchange of fast memory.

Next, don't forget that different parts of your algorithm will have different time scalings as the problem size grows. So if one piece of the code is O(n^3) in time complexity, another is O(n), and another O(exp(n)), then the O(exp(n)) part will begin to dominate the computation as the problem grows in size, even if the constant out front was very small. So small computations may be dominated by the O(n) piece, but eventually, that O(exp(n)) term will grow until the problem becomes computationally impossible to solve.

Even vectorized code can be terribly slow, if the vectorization is terribly done. For example, suppose you want to scale every column of a matrix, using a vector of constants?

N = 10000;
A = rand(N);
K = 1:N;

So I want to multiply the j'th column by K(j).

You might use a double loop, thus effectively crap code written as if it was written in some low level language. At least loops are pretty well optimized in MATLAB these days. But still that will be slow as hell.

tic
B = A; % preallocate B!
for I = 1:N
  for J = 1:N
    B(I,J) = A(I,J)*K(J);
  end
end
toc
Elapsed time is 4.396439 seconds.

So not very fast. Had I not preallocated B, I would still be here days later waiting though.

We could do this using a matrix multiply, with a diagonal matrix. No need to preallocate. but way slower!

tic
B = A*diag(K);
toc
Elapsed time is 48.600078 seconds.

That is because a matrix*matrix multiply os an O(N^3) operation. But we only had to do N*2 multiplies! Most of those multiplies were multiplies by zero, then adding up a lot of mainly zeros.

Making K into a sparse diagonal matrix makes that part much faster.

tic
B = A*spdiags(K(:),0,N,N);
toc
Elapsed time is 0.641262 seconds.

But even sparse matrix multiplies are not perfect. A simple repmat is better here.

tic
B = A.*repmat(K,N,1);
toc
Elapsed time is 0.501364 seconds.

Of course, this is why bsxfun was invented, to replace those repmats.

tic
B = bsxfun(@times,A,K);
toc
Elapsed time is 0.289946 seconds.

If you have a new release of MATLAB, then implicit array expansion helps, because bsxfun brings some overhead.

tic
B = A.*K;
toc
Elapsed time is 0.207587 seconds.

And, of course, had N been larger or smaller, then all of these basic variations might change their times, some getting relatively larger or slower.

In the end, there is no single best way to vectorize ANY block of code. The best way will depend on the problem size. It will sometimes depend on the data itself. It will depend on your skill at writing code in MATLAB, and how well you understand how MATLAB uses memory. It will depend on your knowledge of what functions are available to you. (For example, have you blinked, and missed that some new capability was recently introduced into MATLAB?)

Best Answer

Related Solutions

MATLAB: What are the best practice in mex

MATLAB: Improving the performance of vectorized code possible

Related Question