I have MATLAB Parallel Server set up on a cluster running LSF. When I attempt to validate the cluster configuration it fails.
MATLAB: Am I unable to validate the LSF configuration in the Parallel Computing Toolbox
computingconfigurationlsfMATLAB Parallel Serverparallelpcttoolbox
Related Solutions
There are several issues that can prevent the validation of the cluster. Run the following tests below to make sure that your configuration is setup properly. If at any point you receive an error message, you can submit a request to Installation support using the link at the bottom of the page. When submitting a request, be sure to include the following:
- Your license number
- The release of MATLAB on the client and the cluster
- The output of your validation (click details to get the full information)
- The results of the tests below
Also when submitting a request please reference Solution 1-BJRNU9.
1) Test the licensing of MATLAB Parallel Server
The first step is to ensure that the licensing for MATLAB Parallel Server works on your cluster. This will also test to see if MATLAB is crashing on startup on your cluster. To test this, go to one of the cluster nodes and open up a Windows Command Prompt by clicking on the Start Menu and go to All Programs, Accessories, and click on Command Prompt. In the command prompt, run the following commands:
cd $MATLAB\bin (where $MATLAB is the installation folder for MATLAB on the cluster)
matlab.exe -dmlworker -nodisplay -logfile C:\output.txt -r "ver;exit"
This will generate an output.txt file in C:\ that contains the ver output on the cluster. If the log file contains a network license manager error, this is the issue. In that case, check the support site for the license manager error number and take the appropriate action to resolve the license error before proceeding.
2) Check the releases of MATLAB on the cluster and the client where you validated
If you get the output of the "ver" command in the log file, check the releases of all the products in the list. The release of each product should match for all the products. Additionally, the release should match the release that is installed on the client where you ran the validation. To check the release on the client, run the ver command in MATLAB's command window. If the release of Parallel Computing Toolbox and MATLAB do not match the release of MATLAB and MATLAB Parallel Server on the cluster, you will not be able to use this configuration until the installations are at the same release.
3) Check to make sure that your configuration meets the scheduler requirements
In order to use MATLAB Parallel Server with CCS/HPC Server 2008, there are some additional requirements in the setup. Check the scheduler requirements page here for more details:
Additionally, this configuration requires the following:
- If your client machine is not on the cluster, you will need to install the Microsoft Compute Cluster Pack on the client.
- This configuration requires that the data for the jobs be stored on a shared file space between the clients and the cluster nodes. When creating the configuration, set the "DataLocation" variable to be a path that is accessible to all computers. Ex: \\server\share\user\data
- If the cluster nodes have a local installation of MATLAB Parallel Server and the MATLAB Parallel Server installation is installed in path with spaces such as C:\Program Files, you will need to modify the client configuration's "ClusterMatlabRoot" variable to use the old 8.3 character name format for the path. For example, if MATLAB is installed in C:\Program Files\MATLAB\R2009b, the ClusterMatlabRoot must be set to C:\PROGRA~1\MATLAB\R2009b. If the installation of MATLAB is on a shared server space, this is not an issue.
- In order to use the configuration each node must have the "CCP_SCHEDULER" environment variable set to point to the head node of the cluster. This is also true for the clients running MATLAB if they are not located in the cluster.
4) Check to ensure you have correctly configured the client configuration
In your client MATLAB, go to the Parallel menu to Manage Configurations. Right click on your ccs/hpcserver configuration and select Properties. You must set the appropriate values for ClusterMatlabRoot (the directory where is MATLAB installed on the cluster), DataLocation (where the data will be stored, NOTE: This must be accessible from the same path from all computers). You may want to set SchedulerHostname to be your head node as well.
For R2009b and higher, make sure you set ClusterVersion to the appropriate version for your cluster as well.
If you have confirmed all of the settings above, do all stages fail during validation, or just parallel and Matlabpool? If you are able to pass the Distributed Job phase, the validation may be reporting false errors. To confirm you can manually validate your cluster. To do so:
1. Distributed job:
To run a simple distributed job, run the following:
ccs = findResource('scheduler','configuration','<ConfigurationName>')
Where "ConfigurationName" is the name of the configuration you created
job = createJob(ccs);createTask(job, @sum, 1, {[1 1]});createTask(job, @sum, 1, {[2 2]});createTask(job, @sum, 1, {[3 3]});submit(job)waitForState(job, 'finished', 60)
To confirm the job completed, run the following:
results = getAllOutputArguments(job)
If you get the following output, your cluster is configured and operating correctly.
results =[2][4][6]
2. Parallel job:
After completing the distributed job, run the following:
pj = createParallelJob(ccs);createTask(pj, @labindex, 1, {});set(pj, 'MaximumNumberOfWorkers', 3);set(pj, 'MinimumNumberOfWorkers', 3);submit(pj)waitForState(pj, 'finished', 60)
To confirm the job completed, run the following:
results = getAllOutputArguments(pj)
If you get the following output, your cluster is configured and operating correctly.
results =[1][2][3]
3. MATLAB pool job:
To test MATLABPool or PMODE, simply run the command:
matlabpool open <ConfigName> <#ofLabs>
Where "Configname" is the name of the configuration and "#ofLabs" is the number of nodes to use in the cluster.
If your prompt is returned, your configuration is working. To quit MATLAB pool, simply type "exit".
If the MATLAB pool did not start and you did not receive an error message, try running:
setSchedulerMessageHandler(@disp)
and then try the MATLAB pool commands above. This should capture the error messages and forward them to the MATLAB command window.
If the manual tests passed, your configuration is working and you should be able to submit jobs.
If you are still having an issue, contact Installation support here:
NOTE
: Starting in R2019a the following name changes occurred:
- MATLAB Distributed Computing Server was renamed to MATLAB Parallel Server
- mdce_def was renamed to mjs_def
- mdce binary was renamed to mjs
There are several issues that can prevent the validation of the cluster. Run the following tests below to make sure that your configuration is setup properly. If at any point you receive an error message, you can submit a request to Installation support using the link at the bottom of the page. When submitting a request, be sure to include the following:
- Your license number
- The release of MATLAB on the client and the cluster
- The output of your validation (click details to get the full information)
- The results of the tests below
1) Test the licensing of MATLAB Parallel Server
The first step is to ensure that the licensing for MATLAB Parallel Server works on your cluster. This will also test to see if MATLAB is crashing on startup on your cluster. To test this, go to one of the cluster nodes and open up a Terminal window, then run the following commands:
cd $MATLAB/bin (where $MATLAB is the installation folder for MATLAB on the cluster)./matlab -dmlworker -nodisplay -logfile /var/tmp/output.txt -r "ver;exit"
This will generate an output.txt file in /var/tmp that contains the ver output on the cluster. If the log file contains a network license manager error, this is the issue. In that case, check MATLAB Answers for the license manager error number and take the appropriate action to resolve the license error before proceeding.
2) Check the releases of MATLAB on the cluster and the client where you validated
If you get the output of the "ver" command in the log file, check the releases (R20XXx) of all the products in the list. The release of each product should match for all the products. Additionally, the release should match the release that is installed on the client where you ran the validation. To check the release on the client, run the ver command in MATLAB's command window. If the release of Parallel Computing Toolbox and MATLAB do not match the release of MATLAB and MATLAB Parallel Server on the cluster, you will not be able to use this configuration until the installations are at the same release.
3) Check to make sure that your configuration meets the scheduler requirements
In order to use MATLAB Parallel Server with SLURM, there are some additional requirements in the setup:
- The scheduler binaries do not need to be accessible from the MATLAB client that runs Parallel Computing Toolbox. If the client does not have the binaries, you can submit jobs by utilizing the nonshared configuration on the MATLAB client or by remotely accessing one of the cluster nodes to run the MATLAB client.
- Your cluster should be completely homogeneous; SLURM currently only supports Linux. Mixing different platforms or distributions is not recommended especially for parallel computation.
- This configuration requires that the data for the jobs be stored on a shared file space between the clients and the cluster nodes. When creating the configuration, set the "JobStorageLocation" property to be a path that is accessible to all computers.
- The MATLAB client machine does not have to be the same operating system as the cluster.
- Passwordless SSH access between compute nodes must be setup in order to submit jobs.
For more information, consult the SLURM FAQ here:
4) Check to ensure you have correctly configured the client configuration
In your MATLAB client, go to the Parallel menu to Manage Cluster Profiles. Click on your Generic Profile for SLURM configuration and then click Edit. You must set the appropriate values for ClusterMatlabRoot (the directory where is MATLAB installed on the cluster), JobStorageLocation (where the data will be stored, must be accessible from the same path from all computers), and HasSharedFilesystem (should be set to True).
For more information on filling out the Generic Profile, reference the following guide:
If you have confirmed all of the settings above, do all stages fail during validation, or just parallel and matlabpool/parpool? If you are able to pass the Distributed Job phase, the validation may be reporting false errors. To confirm you can manually validate your cluster follow the instructions in the following article:
If the manual tests passed, your configuration is working and you should be able to submit jobs.
If you still have issues, contact support .
NOTE:
Starting in R2019a the following name changes occurred:
- MATLAB Distributed Computing Server was renamed to MATLAB Parallel Server
- mdce_def was renamed to mjs_def
- mdce binary was renamed to mjs
Related Question
- Can I submit to a MATLAB Parallel Server cluster that is at a different release number than the MATLAB and PCT setup
- What are the installation/configuration steps for MATLAB Distributed Computing Engine
- Do I receive the error ‘Signal 127’ when submitting to LSF
- Can I run the MATLAB Parallel Server (R14SP1+) workers on a heterogeneous cluster
- How to use a nonshared file system with the Distributed Computing Toolbox 3.1 (R2007a)
- Am I unable to validate the TORQUE/PBS Pro configuration in the Parallel Computing Toolbox
- Do I receive an error when submitting jobs to HPC Server from a 32-bit Windows client
- Can a worker registered with the MathWorks Job Manager be accessed from another scheduler (such as PBS)
Best Answer