charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: François TESSIER <francois.tessier AT inria.fr>
- To: Phil Miller <mille121 AT illinois.edu>
- Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>, Gengbin Zheng <zhenggb AT gmail.com>
- Subject: Re: [charm] CkLoop for a load balancer
- Date: Fri, 18 Oct 2013 19:18:56 +0200
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
When I run the application with the run
command above, it crashes (see attachment) or, sometimes, nothing
happens. If I run this with +p8, it works perfectly (the
application runs fine and the load balancer is carried out on a
parallel way) but, of course, only on the first node...
So, what doesn't work is to execute kNeighbor on 8 (or more) nodes, with 8 processes per node and to be able to run my parallel load balancer on the master node every n iterations. Thanks for you help François -- François TESSIER PhD Student at University of Bordeaux Inria - Runtime Team Tel : 0033.5.24.57.41.52 francois.tessier AT inria.fr PGP 0x8096B5FALe 18/10/2013 17:23, Phil Miller a écrit : Please be more specific - what were the *problems* that you
actually encountered? Everything you described seems to be
reasonable.
Did it crash? Did it hang? Did the load balancer not run in
parallel? Did you get unexpected output? What happened that was
wrong?On Fri, Oct 18, 2013 at 4:05 AM,
François Tessier <francois.tessier AT inria.fr>
wrote:
Hi! With the help of some of you, I wrote a parallel load balancer using CkLoop. But I encounter some problems to run an application with this load balancer. For example, I try to do experiments with kNeighbor and I proceeded like that : - Build charm++ : ./build charm++ mpi-linux-x86_64-smp --with-production -j - Go to tmp/libs/ck-libs/ckloop then make - Compile kNeighbor with -module CkLoop All these steps succeeded. My run command looks like : ./charmrun +p64 -machinefile ~/machinefile ./kNeighbor +ppn8 64 50 262144 10 +balancer TreeMatchLB +LBDebug 1 +setcpuaffinity +pemap 0-7 +CmiSleepOnIdle The target platform contains 8 nodes with 8 cores on each. I would like to carry out kNeighbor on 64 processes and parallelize only the load balancing with CkLoop. Do you have any suggestion? Thanks François -- ___________________ François TESSIER PhD Student at University of Bordeaux Inria - Runtime Team Tel : 0033.5.24.57.41.52 francois.tessier AT inria.fr http://runtime.bordeaux.inria.fr/ftessier/ PGP 0x8096B5FA _______________________________________________ charm mailing list charm AT cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/charm |
Running on 8 processors: -machinefile /home/tessier/machinefile_par
./kNeighbor +ppn8 64 50 262144 10 +balancer TreeMatchLB +LBDebug 1
+setcpuaffinity +pemap 0-7 +CmiSleepOnIdle
charmrun> mpirun -np 8 -machinefile /home/tessier/machinefile_par
./kNeighbor +ppn8 64 50 262144 10 +balancer TreeMatchLB +LBDebug 1
+setcpuaffinity +pemap 0-7 +CmiSleepOnIdle
Charm++> Running on MPI version: 2.1
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired:
MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: numNodes 8, 8 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.5.0-beta1-959-g7414d2b
CharmLB> Verbose level 1, load balancing period: 0.5 seconds
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> cpuaffinity PE-core map : 0-7
Charm++> Running on 8 unique compute nodes (8-way SMP).
Charm++> cpu topology info is gathered in 0.233 seconds.
[0] TreeMatchLB created
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
Starting kNeighbor ...
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
CkLoopLib is used in SMP with a simple dynamic scheduling (converse-level
notification) but not using node-level queue
[fourmi052:21146] *** Process received signal ***
[fourmi052:21146] Signal: Segmentation fault (11)
[fourmi052:21146] Signal code: Address not mapped (1)
[fourmi052:21146] Failing at address: 0x30
[fourmi052:21146] [ 0] /lib64/libpthread.so.0 [0x7ffff7bd2a90]
[fourmi052:21146] [ 1] ./kNeighbor(_ZN16FuncSingleHelperC1Ei+0x152) [0x4c0380]
[fourmi052:21146] [ 2]
./kNeighbor(_ZN24CkIndex_FuncSingleHelper32_call_FuncSingleHelper_marshall1EPvS0_+0x8f)
[0x4c17ed]
[fourmi052:21146] [ 3] ./kNeighbor(CkDeliverMessageFree+0x31) [0x509f71]
[fourmi052:21146] [ 4] ./kNeighbor(_Z15_processHandlerPvP11CkCoreState+0xc3f)
[0x50ef0f]
[fourmi052:21146] [ 5] ./kNeighbor(CsdScheduleForever+0x48) [0x5b0a18]
[fourmi052:21146] [ 6] ./kNeighbor(CsdScheduler+0x2d) [0x5b0c9d]
[fourmi052:21146] [ 7] ./kNeighbor [0x5aec18]
[fourmi052:21146] [ 8] ./kNeighbor [0x5aecbb]
[fourmi052:21146] [ 9] /lib64/libpthread.so.0 [0x7ffff7bcb070]
[fourmi052:21146] [10] /lib64/libc.so.6(clone+0x6d) [0x7ffff61c710d]
[fourmi052:21146] *** End of error message ***
[fourmi049:15495] [[26864,0],0]-[[26864,1],0] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 21146 on node fourmi052 exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Attachment:
signature.asc
Description: OpenPGP digital signature
- [charm] CkLoop for a load balancer, François Tessier, 10/18/2013
- Re: [charm] CkLoop for a load balancer, Phil Miller, 10/18/2013
- Re: [charm] CkLoop for a load balancer, François TESSIER, 10/18/2013
- Re: [charm] CkLoop for a load balancer, François Tessier, 10/29/2013
- Re: [charm] CkLoop for a load balancer, François TESSIER, 10/18/2013
- Re: [charm] CkLoop for a load balancer, Phil Miller, 10/18/2013
Archive powered by MHonArc 2.6.16.