MATLAB: The Flop (Floating Point Operations per Second) Rate of MATLAB Code

flopsMATLABperformance

Hello, I know Intel MKL / IPP libraries performance in simple operations (Multiplication, Summation, Matrix Multiplication, Vector Multiplication) gets something like 80-95% of the theoretical performance of the CPU (Measured in FLOPS).

http://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-library http://software.intel.com/en-us/intel-mkl

Yet, doing so using MATLAB I get much worse results.

I have this simple script:

numElements = 2 ^ 16;
numIter     = 100;
vecX = randn(numElements, 1, 'single');
vecY = randn(numElements, 1, 'single');
initTime = tic();
for ii = 1:numIter
    vecX .* vecY;
end
stopTime = toc(initTime);
gFlops = (numElements * numIter) / stopTime

Yet I get only 1.1 GFLOPS on my i7-860 Which should be closer to 2.8GHz (Frequency) * 4 (Cores) * 4 (Single Precisio Operations per Cycle as SSE Vector – 128 Bit) = 44.8 GFLOPS.

Yet I get something like 1.4 GFLOPS. Which is only 3% of the theoretical performance.

How can MATLAB be so inefficient?

Best Answer

I don't think the way you're trying to calculate flops here is right. Even if one assumes that you can calculate Flops like this, you're missing out many overheads that matlab is doing. For example, try something like this:

numElements = 2 ^ 18;
numIter     = 100;
vecX = randn(numElements, 1, 'single');
vecY = randn(numElements, 1, 'single');
initTime = tic();
% for ii = 1:numIter
%     vecX .* vecY;
% end
vecX + vecY; % I used +, but you can switch to .* as well
stopTime = toc(initTime);
gFlops = (numElements * numIter) / stopTime

And see if you see any difference. I am pretty sure you will. Remember, for loop is slow.

Related Solutions

MATLAB: Computing Recursive Function Efficiently

Use a loop. What is the problem? Why do you need to use recursion at all?

- Don't forget that MATLAB always uses 1-based indexing.
- Don't forget to preallocate a vector for the results, IF you need to retain them all. If all you need is the final element, there is no need for preallocation.

Compute f(0). Then loop from 1 to n. Computing f(1), f(2), etc. Stop when you get to n. This is the essence of a loop.

Surely you get the gist.

Use of a recursive scheme here is silly, since it forces additional function call overhead for every recursive step. Memory must be allocated, variables passed. That all takes cpu cycles. (You said you wanted efficiency.) And there are limits on the recursive depth you can descend to anyway.

You are looking for a fancy, sophisticated looking solution, when a simple one is available, at no real cost. The overhead of a loop is almost non-existent, certainly compared to anything else you might try.

My guess is you are asking to do this because of one of...

1. You are pre-optimizing, worrying about the cost of something before you see it is a problem. NEVER do this. Write reasonable code, and only optimize it when you see a bottleneck arise.
2. You have a known time problem. But the fact is, you said yourself that it is the computation of the multiplier that is expensive. (Or something equivalent to that multiplier.) So worrying about loop overhead is silly. Spend your time optimizing whatever it is that is actually taking up time.

MATLAB: IFFT slow down with using gpuArray

What graphics card do you have? How much RAM does it have? It could be that the larger array is just having a harder time because of memory constraints.

Best Answer

Related Solutions

MATLAB: Computing Recursive Function Efficiently

MATLAB: IFFT slow down with using gpuArray

Related Question