MATLAB: How to use a nonshared file system with the Distributed Computing Toolbox 3.1 (R2007a)

MATLAB Parallel Servernonnon-sharednonsharedParallel Computing Toolboxsharedun-sharedunshared

I have a non-shared filesystem which I would like to use with the Distributed Computing Toolbox.

Best Answer

The DCT interface to third party schedulers requires that both a MATLAB client and a MATLAB worker have direct access to a shared directory, known as the Data Location.
You may use your own desktop computer to submit jobs to a compute cluster, as well as to retrieve results. It is often the case that this desktop does not share a network drive with the nodes of the cluster.
Example scripts that work around this problem have been provided with Distributed Computing Toolbox 3.1 R2007a. They are present under the directory
<matlabroot>\toolbox\distcomp\examples\integration\
where "<matlabroot>" is the directory returned by executing the following at the MATLAB prompt:
matlabroot
The solution implemented by the scripts is as follows. There are two Data Location directories: one on the client’s computer and one on the cluster. For example, the local Data Location might be C:\Temp\DCT, while the cluster Data Location might be /share/DCT.
The configuration for the cluster (stored in the file distcompUserConfig) will contain the hostname of the cluster login node, as well as the name of the cluster Data Location (/share/DCT).
When the MATLAB client creates jobs and tasks, they are stored in the local Data Location (C:\Temp\DCT). A call to submit a job leads to all job information, including the tasks, being copied over to the cluster DataLocation (/share/DCT), before the scheduler’s submit command is called. For example, qsub for PBS and bsub for LSF.
If the MATLAB client calls waitForState(job) or get(job, ‘state’), a timer is launched that checks the state of the job every 10 seconds. It does this by copying the job’s state file from the cluster to the local Data Location.
The scheduler dispatches jobs to the cluster. MATLAB workers start up, execute the tasks and save the results in the cluster Data Location (/share/DCT). They also switch the job state to ‘finished’.
Then the MATLAB client will copy the state file back to the client and recognize that the job has completed. It will then copy the output data to the client’s local Data Location (C:\Temp\DCT), where it becomes available for retrieval by the MATLAB client.
The examples that are shipped with DCT cover the following two scenarios.
1. UNIX client submitting to UNIX cluster
(scp is used to copy data back and forth from the cluster. ssh is used to issue commands on the cluster login node e.g.
ssh cluster01 “qsub jobscript”
2. Windows client submitting to UNIX cluster
Putty is used to copy data and issue commands, pscp for data transfer and plink for command execution.
Note: The solution can be adapted to cover the case of a Windows client submitting to Windows cluster.
Example Scripts:
The README files in each of the directories provide more detailed instructions.
1. PBS
MATLAB\toolbox\distcomp\examples\integration\pbs\nonshared\unix
MATLAB\toolbox\distcomp\examples\integration\pbs\nonshared\windows
2. LSF
MATLAB\toolbox\distcomp\examples\integration\lsf\nonshared\unix
MATLAB\toolbox\distcomp\examples\integration\lsf\nonshared\windows
3. Other schedulers
You may adapt the scripts provided for PBS and LSF
Changing the remote command/copy mechanism:
All remote data transfer and command execution is controlled by three files. You may edit these files to use other mechanisms like rsh, psexec, etc. The files are:
copyDataToCluster.m
copyDataFromCluster.m
runCmdOnCluster.m
These files are also shipped with the example scripts, e.g.
MATLAB\toolbox\distcomp\examples\integration\pbs\nonshared\unix\copyDataToCluster.m