MATLAB: How to make data files available to the code running on MATLAB Distributed Computing Server

attachcomputingMATLAB Parallel Servermdcsparallelpcttoolboxupload

I have a MATLAB program which uses some data files. These files are currently stored on my local computer. I can use these data files without issue when running parallel code on my local machine, but when I try to run on my MATLAB Distributed Computer Server cluster, I receive errors saying that the files cannot be found.

How can I make these data files available to my code running on the cluster?

Best Answer

There are three ways to make local data files available to workers on a cluster:

1) *Create a _job_ for your computation, and attach files to the job.* This option does not require infrastructure changes but will not scale well if you have many workers, large files, or a large number of files. The following example creates a job with attached files, adds a task, and submits the job. The code will need to be changed to refer to just the filename included in 'AttachedFiles', instead of the path to the file on the local machine.

c = parcluster('myRemoteClusterProfile');
j = createCommunicatingJob(c,'AttachedFiles', {'myData.csv'});
t = createTask(j, @myFunc, 1, {10,10}); % myFunc has 1 output argument and two inputs
submit(j); % Submit the job to the cluster so it can be run

Refer to the following documentation for more information about creating jobs for a cluster:

https://www.mathworks.com/help/distcomp/createcommunicatingjob.html

*2) Start a parallel pool, and attach files to the job. *This option is very similar to (1), but files will be attached to parallel pool instead of a job. The files will remain on the workers while the pool is open. The same considerations apply to this approach as (1). Example:

c = parcluster('myRemoteClusterProfile'); 
poolobj = parpool(c);
addAttachedFiles(poolobj, {'file1.mat'});

Refer to the following documentation for more information about attaching files to a parallel pool:

http://www.mathworks.com/help/distcomp/addattachedfiles.html

3) Place the data in a networked file share which the worker machines can access. This option may require some infrastructure changes depending on your network, however this option scales better for large files and many workers. Your code would need to use the path to the data at the network location instead of the path to the data on the local hard drive.

Related Solutions

MATLAB: How to take advantage of useParallel option for a remote job running on workers in a MDCS cluster

1. In order to use parallel pool with the cluster, please consider using the 'batch' function procedure to submit the jobs into cluster:

https://www.mathworks.com/help/distcomp/batch.html

When using the 'batch' command, it has the option to specify the number of workers to be used in addition to the job itself. Please refer to the example below:

c=parcluster('profile');
j = batch(c,@scriptUseParallel, 'Pool', 2);
wait(j);
diary(j);
Y2=fetchOutputs(j);

2. In the cluster setup, please consider to configure each machine to have the same number of workers as the number of cores.

Therefore each core can be started with a MATLAB process and being utilized by the batch job.

MATLAB: Do I receive error “Unable to load block diagram…” when simulating a model on a remote cluster

The error message you are receiving is occurring because the workers on the MDCS cluster do not have access to the model file, so they are unable to load the model. The solution to this issue is to place the model in a location accessible by the MDCS workers. There are two alternatives to do this:

1) *Create a _job_ for your computation, and attach files to the job.* This option does not require infrastructure changes but will not scale well if you have many workers and/or large files. The following example creates a job with attached files, adds a task, and submits the job.

c = parcluster('myRemoteClusterProfile');
j = createJob(c,'AttachedFiles', {'myModel.slx'});
t = createTask(j, @myFunc, 1, {10,10}); % myFunc has 1 output argument and two inputs
submit(j); % Submit the job to the cluster so it can be run

2) *Place the model in a networked file share*. This option may require some infrastructure changes depending on your network, however this option scales better for large files and many workers. The only change required to your code would be to use the full path to the model and data file instead of just the name.

For example:

sim('myModel.slx');

would become:

sim('\\path\to\network\share\myModel.slx');

Best Answer

Related Solutions

MATLAB: How to take advantage of useParallel option for a remote job running on workers in a MDCS cluster

MATLAB: Do I receive error “Unable to load block diagram…” when simulating a model on a remote cluster

Related Question