charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Chiara Orsini <chiara.orsini AT iet.unipi.it>
- To: charm AT cs.uiuc.edu
- Subject: [charm] how to run processes on a distributed environment
- Date: Wed, 23 Jun 2010 16:06:17 +0200
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
Dear Charm users,
I tried to run simplearrayhello program on multiple machines (using a nodelist file) but I got some problems.
Specifically:
1) I build charm-6.2.0-net-darwin-x86_64 on a MacBook (Mac OS X 10.6.4, IP address A) and on an iMac (Mac OS X 10.6.4, IP address B).
2) Then I configured node A and node B in order to obtain a ssh access without password
3) I successfully compiled charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/ test program on both computers
4) Here is the nodefile I saved on node B:
group main ++shell ssh
host localhost
host node_B_name
host node_A_IP_address
host localhost
host node_B_name
host node_A_IP_address
5) If I run ./charmrun hello +p2 ++verbose on B terminal the program ends, localhost and node_B_name, indeed, both refer to the local machine.
Even if the program ends, I obtain this error message "could not lookup DNS configuration info service: (ipc/send) invalid destination port".
This is the complete output:
Charmrun> charmrun started...
Charmrun> using ./nodelist as nodesfile
Charmrun> adding client 0: "localhost", IP:127.0.0.1
Charmrun> adding client 1: "node_B_name", IP:node_B_IP_address
Charmrun> Charmrun = node_B_IP_address, port = 51729
Charmrun> Sending "0 node_B_IP_address51729 4610 0" to client 0.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 0.
Charmrun> Starting ssh localhost -l user /bin/sh -f
Charmrun> remote shell (localhost:0) started
Charmrun> Sending "1 node_B_IP_address 51729 4610 0" to client 1.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 1.
Charmrun> Starting ssh node_B_name -l user /bin/sh -f
Charmrun> remote shell (node_B_name:1) started
Charmrun> node programs all started
Charmrun remote shell(localhost.0)> remote responding...
Charmrun remote shell(localhost.0)> starting node-program...
Charmrun remote shell(localhost.0)> rsh phase successful.
Charmrun remote shell(node_B_name.1)> remote responding...
Charmrun remote shell(node_B_name.1)> starting node-program...
Charmrun remote shell(node_B_name.1)> rsh phase successful.
Charmrun> Waiting for 0-th client to connect.
Charmrun> Waiting for 1-th client to connect.
Charmrun> client 0 connected (IP=127.0.0.1 data_port=58160)
Charmrun> client 1 connected (IP=node_B_IP_address data_port=58504)
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
Charm++: scheduler running in netpoll mode.
Charm++> cpu topology info is being gathered.
could not lookup DNS configuration info service: (ipc/send) invalid destination port
could not lookup DNS configuration info service: (ipc/send) invalid destination port
Charm++> Running on 1 unique compute nodes (2-way SMP).
Running Hello on 2 processors for 5 elements
Hello 0 created
Hello 1 created
Hello 2 created
Hi[17] from element 0
Hi[18] from element 1
Hi[19] from element 2
All done
Hello 3 created
Hello 4 created
Hi[20] from element 3
Hi[21] from element 4
Charmrun> Graceful exit.
Charmrun> charmrun started...
Charmrun> using ./nodelist as nodesfile
Charmrun> adding client 0: "localhost", IP:127.0.0.1
Charmrun> adding client 1: "node_B_name", IP:node_B_IP_address
Charmrun> Charmrun = node_B_IP_address, port = 51729
Charmrun> Sending "0 node_B_IP_address51729 4610 0" to client 0.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 0.
Charmrun> Starting ssh localhost -l user /bin/sh -f
Charmrun> remote shell (localhost:0) started
Charmrun> Sending "1 node_B_IP_address 51729 4610 0" to client 1.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 1.
Charmrun> Starting ssh node_B_name -l user /bin/sh -f
Charmrun> remote shell (node_B_name:1) started
Charmrun> node programs all started
Charmrun remote shell(localhost.0)> remote responding...
Charmrun remote shell(localhost.0)> starting node-program...
Charmrun remote shell(localhost.0)> rsh phase successful.
Charmrun remote shell(node_B_name.1)> remote responding...
Charmrun remote shell(node_B_name.1)> starting node-program...
Charmrun remote shell(node_B_name.1)> rsh phase successful.
Charmrun> Waiting for 0-th client to connect.
Charmrun> Waiting for 1-th client to connect.
Charmrun> client 0 connected (IP=127.0.0.1 data_port=58160)
Charmrun> client 1 connected (IP=node_B_IP_address data_port=58504)
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
Charm++: scheduler running in netpoll mode.
Charm++> cpu topology info is being gathered.
could not lookup DNS configuration info service: (ipc/send) invalid destination port
could not lookup DNS configuration info service: (ipc/send) invalid destination port
Charm++> Running on 1 unique compute nodes (2-way SMP).
Running Hello on 2 processors for 5 elements
Hello 0 created
Hello 1 created
Hello 2 created
Hi[17] from element 0
Hi[18] from element 1
Hi[19] from element 2
All done
Hello 3 created
Hello 4 created
Hi[20] from element 3
Hi[21] from element 4
Charmrun> Graceful exit.
6) If I run ./charmrun hello +p3 ++verbose on node B terminal, the program does not end. This time the program should activate a chare on computer A.
When I run this command I can see two hello processes running on computerB and one hello process running on computer A.
As I said the program never ends. This is the output I obtain:
Charmrun> charmrun started...
Charmrun> using ./nodelist as nodesfile
Charmrun> adding client 0: "localhost", IP:127.0.0.1
Charmrun> adding client 1: "node_B_name", IP:node_B_IP_address
Charmrun> adding client 2: "node_A_IP_address", IP:node_A_IP_address
Charmrun> Charmrun = node_B_IP_address, port = 51794
Charmrun> Sending "0 node_B_IP_address 51794 4640 0" to client 0.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 0.
Charmrun> Starting ssh localhost -l user/bin/sh -f
Charmrun> remote shell (localhost:0) started
Charmrun> Sending "1 node_B_IP_address 51794 4640 0" to client 1.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 1.
Charmrun> Starting ssh node_B_name-l user/bin/sh -f
Charmrun> remote shell (node_B_name:1) started
Charmrun> Sending "2 node_B_IP_address 51794 4640 0" to client 2.
Charmrun> find the node program "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello/hello" at "/Users/user/Library/charm-6.2.0-net-darwin-x86_64/tests/charm++/simplearrayhello" for 2.
Charmrun> Starting ssh node_A_IP_address-l user/bin/sh -f
Charmrun> remote shell (node_A_IP_address:2) started
Charmrun> node programs all started
Charmrun remote shell(localhost.0)> remote responding...
Charmrun remote shell(localhost.0)> starting node-program...
Charmrun remote shell(localhost.0)> rsh phase successful.
Charmrun remote shell(node_B_name.1)> remote responding...
Charmrun remote shell(node_B_name.1)> starting node-program...
Charmrun remote shell(node_B_name.1)> rsh phase successful.
Charmrun remote shell(node_A_IP_address.2)> remote responding...
Charmrun remote shell(node_A_IP_address.2)> starting node-program...
Charmrun remote shell(node_A_IP_address.2)> rsh phase successful.
Charmrun> Waiting for 0-th client to connect.
Charmrun> Waiting for 1-th client to connect.
Charmrun> Waiting for 2-th client to connect.
Charmrun> client 0 connected (IP=127.0.0.1 data_port=60925)
Charmrun> client 1 connected (IP=node_B_IP_address data_port=59590)
Charmrun> client 2 connected (IP=node_A_IP_address data_port=49701)
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
Charm++: scheduler running in netpoll mode.
Charm++> cpu topology info is being gathered.
could not lookup DNS configuration info service: (ipc/send) invalid destination port
could not lookup DNS configuration info service: (ipc/send) invalid destination port
could not lookup DNS configuration info service: (ipc/send) invalid destination port
I would be very grateful if someone could explain me why I can not run the simplearrayhello program on more than one machine.
Thank you for your attention.
Best regards,
Chiara Orsini
- [charm] how to run processes on a distributed environment, Chiara Orsini, 06/23/2010
Archive powered by MHonArc 2.6.16.