MATLAB: Slow Execution of Parfor Loops due to Communication Overhead: Load static data into worker workspace memory

Parallel Computing Toolboxparforworker object wrapperwrapper

For my research, I require near realtime execution of a large number (>1000) of matrix-vector multiplications of the form A*x with A a medium scale matrix (e.g. 150×150). These matrices are constructed in an extremely expensive operation (takes hours to complete), and saved in a static data structure (MatSet in the example below). This static data structure is used by all workers, and is not modified after creation.
When I run the code, which is equivalent to the code below, I find that the PARFOR loop is more than 10 times slower than the FOR loop in Matlab 2010b. This is caused by a constant transfer of data (MatSet in this case) between workers. In my case, however, this data transfer is completely unnecessary as MatSet is a read-only dataset!
My question is whether there is some way of loading a STATIC dataset into the workspace of the workers so as to prevent unnecessary communication overhead between workers? Is it possible to do this without having to load data from disk?
Here is the demo code:
matlabpool(2); % init 2 worker threads
Msize = 150; Nloop = 1000;
c1 = zeros(Msize, Nloop); c2 = zeros(Msize, Nloop);
% parallel initialization loop
MatSet = cell(Nloop, 1);
parfor i=1:Nloop
MatSet{i} = rand(Msize); % simulates expensive code operation
end
% real-time parallel loop (SLOW!)
tic;
parfor i=1:Nloop
c1(:,i) = MatSet{i} * rand(Msize, 1);
end
time1 = toc;
% real-time serial loop (for comparison)
tic;
for i=1:Nloop
c2(:,i) = MatSet{i} * rand(Msize, 1);
end
time2 = toc;
fprintf('Parallel time: %2.4f ms, Serial Time: %2.4f ms\n', 1000*time1,1000*time2);
matlabpool close;
Any comments are appreciated,
Coen

Best Answer

You might be able to take advantage of my Worker Object Wrapper which is designed to help set up this sort of static data to be used on workers.