charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Dominic Roehm <dominic.roehm AT gmail.com>
- To: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: [charm] charmrun timeout problem
- Date: Fri, 21 Nov 2014 18:39:59 +0100
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
Hi,
I tried run my charm code on 2 8-core nodes on my local cluster but I
get a timeout by the node-program. It timeouts during the starting
procedure. The itself code run successful on different workstations and
clusters. Does anyone have an idea what the problem is or how to get
more information about the issue? Error msg:
Dominic
Charmrun> charmrun started...
Charmrun> adding client 0: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 1: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 2: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 3: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 4: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 5: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 6: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 7: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 8: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 9: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 10: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 11: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 12: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 13: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 14: "127.0.0.1", IP:127.0.0.1
Charmrun> adding client 15: "127.0.0.1", IP:127.0.0.1
Charmrun> Charmrun = 10.1.255.254, port = 45243
Charmrun> Sending "$CmiMyNode 10.1.255.254 45243 8527 0" to client 0.
Charmrun> find the node program
"/home/dominic/work/hmm/benchmarks/charm_tp2/../../CoHMM/charm_bin/2D_Kriging"
at "/home/dominic/work/hmm/benchmarks/charm_tp2" for 0.
Charmrun> Starting mpiexec ./charmrun.8527
Charmrun> mpiexec started
Charmrun> node programs all started
Charmrun> Waiting for 0-th client to connect.
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> remote responding...
Charmrun remote shell(127.0.0.1.0)> starting node-program...
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun remote shell(127.0.0.1.0)> rsh phase successful.
Charmrun> error 0 attaching to node:
Timeout waiting for node-program to connect
- [charm] charmrun timeout problem, Dominic Roehm, 11/21/2014
- Re: [charm] charmrun timeout problem, Phil Miller, 11/21/2014
- Message not available
- Re: [charm] charmrun timeout problem, Phil Miller, 11/23/2014
- Message not available
- Re: [charm] charmrun timeout problem, Phil Miller, 11/23/2014
- Message not available
- <Possible follow-up(s)>
- [charm] charmrun timeout problem, Dominic Roehm, 11/21/2014
- Re: [charm] charmrun timeout problem, Phil Miller, 11/21/2014
Archive powered by MHonArc 2.6.16.