MATLAB: Does the job take much longer when using the HPC 2008 scheduler compared to the native MATLAB scheduler in the Parallel Computing Toolbox 4.3 (R2010a)

Parallel Computing Toolbox

When I execute my job using the native MATLAB scheduler, it is able to complete in 126 seconds. However, when I process the same job using the HPC 2008 scheduler, it takes 1086 seconds to complete, which is a whole order of magnitude slower. I am surprised that just changing the scheduler can have such a dramatic effect on job completion time.

Best Answer

In general, there will be more latency when dealing with a third party scheduler compared to using the native MATLAB scheduler. The latency is magnified by the number of tasks created, so when possible, it is much better to consolidate a large number of small tasks into a smaller number of larger tasks. Alternatively, you can use a PARFOR loop.
In terms of timing the job submission and execution time, it is not a very fair comparison to time the submission and waiting time since this includes all the time that the job may be sitting in the queue behind other jobs. It would be better to time just the task evaluation part by using TIC/TOC inside the task function. There should be little difference between the times in the different cases.
As far as HPC 2008 specific concerns go, the time that it takes to create tasks is highly dependent on the DataLocation and the underlying filesystem. Each task requires 5 files to be created in the DataLocation. If file access is slow, then task creation will also be slow. All the task files are being created into the same folder and the more tasks you have, the slower it will be to create a new file in that location. If DataLocation is on a shared network drive, that would be an additional factor. Note that task evaluation requires job and task files in the DataLocation to be modified, and if file access is slow, you may see that the task evaluation time also suffers.
With HPC Server, a new MATLAB instance is started up for each distributed job's task if the UseSOAJobSubmission property is false. If MDCS is installed on a common network installation, there can be a lot of overhead if a large number of machines all try to launch MATLAB at the same time. If you would prefer to re-use MATLAB worker sessions, you should set UseSOAJobSubmission to true in the configuration (and ensure that SOA mode is properly set up on the cluster).
Related Question