There are three ways to make local data files available to workers on a cluster:
1) *Create a _job_ for your computation, and attach files to the job.* This option does not require infrastructure changes but will not scale well if you have many workers, large files, or a large number of files. The following example creates a job with attached files, adds a task, and submits the job. The code will need to be changed to refer to just the filename included in 'AttachedFiles', instead of the path to the file on the local machine.
c = parcluster('myRemoteClusterProfile');
j = createCommunicatingJob(c,'AttachedFiles', {'myData.csv'});
t = createTask(j, @myFunc, 1, {10,10});
submit(j);
Refer to the following documentation for more information about creating jobs for a cluster:
*2) Start a parallel pool, and attach files to the job. *This option is very similar to (1), but files will be attached to parallel pool instead of a job. The files will remain on the workers while the pool is open. The same considerations apply to this approach as (1). Example:
c = parcluster('myRemoteClusterProfile');
poolobj = parpool(c);
addAttachedFiles(poolobj, {'file1.mat'});
Refer to the following documentation for more information about attaching files to a parallel pool:
3) Place the data in a networked file share which the worker machines can access. This option may require some infrastructure changes depending on your network, however this option scales better for large files and many workers. Your code would need to use the path to the data at the network location instead of the path to the data on the local hard drive.
Best Answer