Hi,
I am trying to use matlab parallel computation toolbox on a cluster. When I try to validate my scheduler configuration, the distributed job passes the validation but the parallel job fails with the following error:
Stage: Parallel JobStatus: FailedDescription: The given stage reached the default or user-specified timeout.Command Line Output:2346069.pbs001.palmetto.clemson.edu
Additionally I find the following error in the lob file on the cluster:
Node file: /var/spool/torque/aux//2346072.pbs001.palmetto.clemson.eduStarting SMPD on node0218 node0219 node0275 node0276 ...ssh node0218 "/opt/matlab-R2010a/bin/mw_smpd" -s -phrase MATLAB -port 26072Warning: Permanently added 'node0218,10.125.1.218' (RSA) to the list of known hosts.^MPermission denied, please try again.^MPermission denied, please try again.^MPermission denied (publickey,gssapi-with-mic,password).^MLaunching smpd failed for node: node0218Stopping SMPD on ...Exiting with code: 0
The settings which I have used for the scheduler are:
set(sched, 'ClusterMatlabRoot', '/opt/matlab-new'); set(sched, 'HasSharedFilesystem', true);set(sched, 'ClusterOsType', 'unix'); set(sched, 'SubmitFcn',{@pbsNonSharedSimpleSubmitFcn,clusterHost, remoteDataLocation});set(sched, 'ParallelSubmitFcn',{@pbsNonSharedParallelSubmitFcn, clusterHost, remoteDataLocation});
I have also setup a passwordless ssh connection using a rsa key. Could anyone tell me what is wrong with my configuration?
Thanks in advance.
Best Answer