MATLAB: When to use codistributed arrays

codistributed arrayscodistributordistributedparallelparallel computingParallel Computing Toolbox

Imagine I have the following matrix:
A = rand(6400,6400)
Now imagine I create a distributed array from it: (I have 4 workers)
dist = codistributor1d();
dist = codistributor2dbc([2 2],3200);
B = codistributed(A,dist)
There are two ways for me to distribute it. Either each worker stores a 3200×3200 matrix or each worker stores a 6400×1600 matrix.
My questions are:
  • When should I distribute an array?
  • How do I know which function, codistributor1d or codistributor2dbc, I should use whenever I have some array I want to distribute between workers? I know how to work with both type of arrays but I don't know when one is better than the other.
If anyone could help me I'd appreciate.

Best Answer

Distributed arrays are most useful when you do not have enough memory to store an entire array on a single machine. By distributing chunks of the original array across all the workers in the pool, you can perform operations on the entire array that you previously couldn't even store.
I would suggest that you take a step back and start by working with distributed arrays rather than codistributed arrays. This allows MATLAB to choose a default distribution scheme for you.
A = rand(6400,6400);
matlabpool open
dA = distributed(A); % let MATLAB pick a default distribution scheme for the
% distributed array, dA
R = chol(dA); % example of a function that works for
% distributed arrays - no spmd required
With distributed arrays, you can get started without having to worry about what distribution scheme to use. If you are interested, you can always query the distribution scheme that is currently used by a distributed array like so:
% dA is the distributed array created as above
spmd
codistr = getCodistributor(dA) % inside spmd we can "view" dA as a
% codistributed array and access its
% distribution scheme
end
Once your code is working correctly with distributed arrays, you can make minor changes to use codistributed arrays with a specific distribution scheme. Changing the default distribution scheme may improve performance, but as Matt J mentions, choosing the most efficient distribution scheme is problem dependent and will depend heavily on the operations that you want to call.