MATLAB: How to make data files available to the code running on MATLAB Distributed Computing Server

attachcomputingMATLAB Parallel Servermdcsparallelpcttoolboxupload

I have a MATLAB program which uses some data files. These files are currently stored on my local computer. I can use these data files without issue when running parallel code on my local machine, but when I try to run on my MATLAB Distributed Computer Server cluster, I receive errors saying that the files cannot be found.
How can I make these data files available to my code running on the cluster?

Best Answer

There are three ways to make local data files available to workers on a cluster:
1) *Create a _job_ for your computation, and attach files to the job.* This option does not require infrastructure changes but will not scale well if you have many workers, large files, or a large number of files. The following example creates a job with attached files, adds a task, and submits the job. The code will need to be changed to refer to just the filename included in 'AttachedFiles', instead of the path to the file on the local machine.
c = parcluster('myRemoteClusterProfile');
j = createCommunicatingJob(c,'AttachedFiles', {'myData.csv'});
t = createTask(j, @myFunc, 1, {10,10}); % myFunc has 1 output argument and two inputs
submit(j); % Submit the job to the cluster so it can be run
Refer to the following documentation for more information about creating jobs for a cluster:
*2) Start a parallel pool, and attach files to the job. *This option is very similar to (1), but files will be attached to parallel pool instead of a job. The files will remain on the workers while the pool is open. The same considerations apply to this approach as (1). Example:
c = parcluster('myRemoteClusterProfile');
poolobj = parpool(c);
addAttachedFiles(poolobj, {'file1.mat'});
Refer to the following documentation for more information about attaching files to a parallel pool:
3) Place the data in a networked file share which the worker machines can access. This option may require some infrastructure changes depending on your network, however this option scales better for large files and many workers. Your code would need to use the path to the data at the network location instead of the path to the data on the local hard drive.