We want to set up a cluster of two PCs (intel core i5 with 4 cores per machine). We are using the release of MATLAB 2009b and the admin center to generate a job manager with 4 workers, one core per worker (2 workers per machine). The mdce is installed in the two machines with the default mdce_def. This process works fine.
The problems appear when we try to run a parallel configuration, using this job manager with a minimun and maximun of 4 workers, because the parallel test fail.
This process generates several error lines in the mdce-service.log in log folder:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:job aborted using terminate/kill:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:process: node: exit code: error message:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:err:MPI_Comm_connect(119)…………………: MPI_Comm_connect(port="tag=0 port=28351 description=lp-apd12 ifname=172.22.4.92 ", MPI_INFO_NULL, root=0, comm=0x84000000, newcomm=0000000001023A60) failed
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:0: localhost: 1: Fatal error in MPI_Comm_connect: Other MPI error, error stack:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:err:MPID_Comm_connect(187)………………..:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:err:MPIDI_Comm_connect(405)……………….:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:err:MPIC_Sendrecv(126)……………………:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:MPI_Comm_connect(119)…………………: MPI_Comm_connect(port="tag=0 port=28351 description=lp-apd12 ifname=172.22.4.92 ", MPI_INFO_NULL, root=0, comm=0x84000000, newcomm=0000000001023A60) failed
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:err:MPIC_Wait(270)……………………….:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:MPID_Comm_connect(187)………………..:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:MPIDI_Comm_connect(405)……………….:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:err:MPIDI_CH3i_Progress_wait(215)………….: an error occurred while handling an event returned by MPIDU_Sock_Wait()
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:MPIC_Sendrecv(126)……………………:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:err:MPIDI_CH3I_Progress_handle_sock_event(420):
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:MPIC_Wait(270)……………………….:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:MPIDI_CH3i_Progress_wait(215)………….: an error occurred while handling an event returned by MPIDU_Sock_Wait()
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:MPIDI_CH3I_Progress_handle_sock_event(420):
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-16:err:Fatal error in MPI_Intercomm_merge: Other MPI error, error stack:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-16:err:MPI_Intercomm_merge(284): MPI_Intercomm_merge(comm=0xc4000005, high=1, newintracomm=0000000001023A68) failed
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-16:out:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-16:out:job aborted using terminate/kill:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:out:MPIDU_Sock_wait(2603)…………………: The specified network name is no longer available. (errno 64)
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-16:out:process: node: exit code: error message:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-16:out:0: localhost: 1: Fatal error in MPI_Intercomm_merge: Other MPI error, error stack:
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-16:out:MPI_Intercomm_merge(284): MPI_Intercomm_merge(comm=0xc4000005, high=1, newintracomm=0000000001023A68) failed
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-16:out:MPI_Intercomm_merge(262): Too many communicators
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-15:err:MPIDU_Sock_wait(2603)…………………: The specified network name is no longer available. (errno 64)
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-16:err:MPI_Intercomm_merge(262): Too many communicators
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-17:out:Warning: Unrecognized MATLAB option "cp".
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-17:out:Warning: Unrecognized MATLAB option "nodisplay".
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-17:out:Warning: Unrecognized MATLAB option "Djava.security.policy=C:\Program Files\MATLAB\R2009b\toolbox\distcomp\config\jsk-all.policy".
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-18:out:Warning: Unrecognized MATLAB option "cp".
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-18:out:Warning: Unrecognized MATLAB option "nodisplay".
INFO | jvm 1 | 2011/08/18 16:09:04 | Thu Aug 18 16:09:04 CEST 2011:Group-18:out:Warning: Unrecognized MATLAB option "Djava.security.policy=C:\Program Files\MATLAB\R2009b\toolbox\distcomp\config\jsk-all.policy".
INFO | jvm 1 | 2011/08/18 16:09:05 | Thu Aug 18 16:09:05 CEST 2011:Group-18:out:Warning: Unable to locate a personal folder for $documents\MATLAB
INFO | jvm 1 | 2011/08/18 16:09:05 | Thu Aug 18 16:09:05 CEST 2011:Group-18:out:{Warning: Userpath must be an absolute path and must exist on disk.}
INFO | jvm 1 | 2011/08/18 16:09:05 | Thu Aug 18 16:09:05 CEST 2011:Group-17:out:Warning: Unable to locate a personal folder for $documents\MATLAB
INFO | jvm 1 | 2011/08/18 16:09:05 | Thu Aug 18 16:09:05 CEST 2011:Group-17:out:{Warning: Userpath must be an absolute path and must exist on disk.}
INFO | jvm 1 | 2011/08/18 16:09:06 | Thu Aug 18 16:09:06 CEST 2011:Group-17:out:
INFO | jvm 1 | 2011/08/18 16:09:06 | Thu Aug 18 16:09:06 CEST 2011:Group-17:out: To get started, type one of these: helpwin, helpdesk, or demo.
INFO | jvm 1 | 2011/08/18 16:09:06 | Thu Aug 18 16:09:06 CEST 2011:Group-17:out: For product information, visit www.mathworks.com.
INFO | jvm 1 | 2011/08/18 16:09:06 | Thu Aug 18 16:09:06 CEST 2011:Group-17:out:
INFO | jvm 1 | 2011/08/18 16:09:06 | Thu Aug 18 16:09:06 CEST 2011:Group-18:out:
INFO | jvm 1 | 2011/08/18 16:09:06 | Thu Aug 18 16:09:06 CEST 2011:Group-18:out: To get started, type one of these: helpwin, helpdesk, or demo.
INFO | jvm 1 | 2011/08/18 16:09:06 | Thu Aug 18 16:09:06 CEST 2011:Group-18:out: For product information, visit www.mathworks.com.
INFO | jvm 1 | 2011/08/18 16:09:06 | Thu Aug 18 16:09:06 CEST 2011:Group-18:out:
INFO | jvm 1 | 2011/08/18 16:09:07 | Thu Aug 18 16:09:07 CEST 2011:Group-17:out:» Thu Aug 18 16:09:07 CEST 2011 Worker started: pc-goba_worker02
INFO | jvm 1 | 2011/08/18 16:09:08 | Thu Aug 18 16:09:07 CEST 2011:Group-18:out:» Thu Aug 18 16:09:07 CEST 2011 Worker started: pc-goba_worker01
Thanks
Best Answer