MATLAB: When to use codistributed arrays

codistributed arrayscodistributordistributedparallelparallel computingParallel Computing Toolbox

Imagine I have the following matrix:

A = rand(6400,6400)

Now imagine I create a distributed array from it: (I have 4 workers)

dist = codistributor1d();
dist = codistributor2dbc([2 2],3200);
B = codistributed(A,dist)

There are two ways for me to distribute it. Either each worker stores a 3200×3200 matrix or each worker stores a 6400×1600 matrix.

My questions are:

When should I distribute an array?
How do I know which function, codistributor1d or codistributor2dbc, I should use whenever I have some array I want to distribute between workers? I know how to work with both type of arrays but I don't know when one is better than the other.

If anyone could help me I'd appreciate.

Best Answer

Distributed arrays are most useful when you do not have enough memory to store an entire array on a single machine. By distributing chunks of the original array across all the workers in the pool, you can perform operations on the entire array that you previously couldn't even store.

I would suggest that you take a step back and start by working with distributed arrays rather than codistributed arrays. This allows MATLAB to choose a default distribution scheme for you.

A = rand(6400,6400);
matlabpool open 
dA = distributed(A); % let MATLAB pick a default distribution scheme for the 
                     % distributed array, dA
R = chol(dA);        % example of a function that works for 
                     % distributed arrays - no spmd required

With distributed arrays, you can get started without having to worry about what distribution scheme to use. If you are interested, you can always query the distribution scheme that is currently used by a distributed array like so:

% dA is the distributed array created as above
spmd
codistr = getCodistributor(dA) % inside spmd we can "view" dA as a 
                               % codistributed array and access its
                               % distribution scheme
end

Once your code is working correctly with distributed arrays, you can make minor changes to use codistributed arrays with a specific distribution scheme. Changing the default distribution scheme may improve performance, but as Matt J mentions, choosing the most efficient distribution scheme is problem dependent and will depend heavily on the operations that you want to call.

Related Solutions

MATLAB: Distributing arrays to workers for local processing

Hello. If you are able to successfully open a matlabpool with your installation of R2011b, then you must have the Parallel Computing Toolbox. In that case, the getLocalPart function should also be available to you. What is the output from typing the following at the MATLAB command line:

which getLocalPart

Assuming that you can get the issue with getLocalPart sorted out (perhaps by calling technical support), this is how you would proceed with distributed arrys/spmd:

matlabpool open 100 % this will open 100 workers 
                    % using your default configuration
% I assume that myMat was already loaded as a standard MATLAB array
size(myMat)     % You've stated that myMat is 800000 x 2     
% There are a lot of rows, so let's use codistributor1d to 
% distribute the rows across all the workers in the pool.  This must
% be done inside the spmd block because that's where 
% codistributed arrays and codistributors live.
spmd 
  codist = codistributor1d(1); % Create a scheme to distribute the first
                               % dimension of a matrix (its rows) as evenly as
                               % possible across all the workers in the 
                               % pool    
  myMatdb = codistributed(myMat, codist);  % Use the scheme to create 
                                           % distributed data
  chunk_of_data = getLocalPart(myMatdb);   % Each worker operates on its data 
  [out_of_chunk] = objFun(params, chunk_of_data);
  fullOutput = codistributed.build(out_of_chunk, codist); % Create a new 
                                                          % array from the 
                                                          % local outputs. I
                                                          % assume that
                                                          % out_of_chunk is
                                                          % the same size as 
                                                          % chunk_of_data on
                                                          % each worker so
                                                          % that the
                                                          % codistributor can
                                                          % be reused.
end  
% fullOutput and myMatdb can be used as distributed arrays outside of the spmd block

You can find more information here:

help getLocalPart
help codistributor.build

MATLAB: Split a matrix on parallel computing with spmd

You could do that using distributed arrays. For example

    spmd
        d = codistributed(rand(4, 18), codistributor1d(1))
    end

will result in each worker storing a 1x18 portion of d.

Best Answer

Related Solutions

MATLAB: Distributing arrays to workers for local processing

MATLAB: Split a matrix on parallel computing with spmd

Related Question