MATLAB: Parfor slower than for in the code.

MATLABParallel Computing Toolboxparfor

Hi,
My code is the following
parfor nn=1:Na
temp=A{nn}*B{nn};
temp(1,1) = 0; temp(Nb,1) = 0;
D(nn,:)=((C{nn}\temp));
end
Where A ,C are cell arrays storing Na number of sparse matrices-(NbxNb), B is a cell array storing Na vectors of Nbx1 size
I am finding that my above loop is faster when i use 'for' loop instead of 'parfor' loop. Does anyone know why? (assume that I am only intrested in the reason for-' the code being slower with parfor compared to for'.)
Thank you.

Best Answer

For sufficiently large matrices the * operation is handled by high performance multi threaded library that uses all available cores. When you parfor then by default each worker gets one core so in each worker the high performance library operates in effective serial . You have several of those serial operation at the same time but each one takes more clock time than the original multithreaded operation. It is at best equivalent total time but probably worse since your number of cores used in multithreaded mode might exceed your parpool size.
... and meanwhile each parpool worker is aa separate process that has to be created and be set up to run MATLAB . It is certain that you will have that overhead when you parfor so parfor is often slower than not using parfor.
parfor wins under two circumstances:
  1. the code would not be automatically parallelized with the high performance library , such as if the arrays are too small or the operation does not match an implemented pattern . In such a case you still have the overhead of creating the workers but you can get several done in the same clock time
  2. the code involves waiting for external resources such as disk or network . In such aa case with multiple workers some could be making progress while others are waiting which is better than having the cpus sit idle for every wait.