MATLAB: ‘parfor’ runs sequentially when using large dataset

Parallel Computing Toolbox

I am running Parallel Computing Toolbox (PCT) locally using 'parfor' loop. My working machine has 64GB memory and 4 available cores
Within the 'parfor' loop, I am displaying the counter variable "i". For a small dataset, the counter variable is displayed out of order which indicates that the work is truly being done in parallel.
For a large dataset (10GB) the counter variable appears to be displayed in serial order every time, and the code does not speed up at all, which makes it seem like it may not be performing in parallel. Looking at Task Manager, there doesn't seem to be any disk activity, and only 30% memory utilization and 30% CPU usage.
I received following warning message:
Warning: Error caught during construction of remote parfor code.
The parfor construct will now be run locally rather than on the remote parallel pool. The most likely cause of this is an inability to send input arguments to the workers because of a serialization
error. The error report from the caught error is:
Error using distcompserialize64
Out of Memory during serialization
But considering I have 64BG memory and I only copied data to 4 workers, I wonder why it will run out of memory

Best Answer

Unfortunately, the process of copying the data to the workers makes several temporary copies, so your 64GB memory is not enough for copying 4*10GB data to parallel workers. When parallel pool don't have enough memory to initialize, it will turns to run the 'parfor' loop sequentially, which is the behavior you are observing.
As possible workarounds:
1) If thread workers are sufficient, then this might possibly avoid the data duplication. I.e. use
parpool('threads')
 \nbefore running the 'parfor' loop.\n \n2) With process workers (i.e. "parfor('local')"), then 'parallel.pool.Constant' might help. Here's how you would do that:
data = parallel.pool.Constant(@createLargeArray);
parfor i = 1:3000
   out(i) = doStuff(data.Value, i);
end
 \nHere, I've assumed a function 'createLargeArray' which creates the 10GB data, and a function 'doStuff' that uses that data.