charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Steve Petruzza <spetruzza AT sci.utah.edu>
- To: charm <charm AT lists.cs.illinois.edu>
- Subject: [charm] Scalability issues using large chare array
- Date: Mon, 1 Aug 2016 15:44:38 +0300
Hi all,
In my application I have a single chare array in the main chare that creates thousands of chare tasks that eventually will execute some tasks and communicate between them (not all simultaneously).
If I run on 1024 cores I get the following at the startup:
Charm++> Running on Gemini (GNI) with 1024 processes
Charm++> static SMSG
Charm++> SMSG memory: 5056.0KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 2048K
Charm++> Running in SMP mode: numNodes 1024, 1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.7.0-281-g8d5cdd9
Warning> using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 64 unique compute nodes (16-way SMP).
Charm++> Warning: the number of SMP threads (32) is greater than the number of physical cores (16), so threads will sleep while idling. Use +\
CmiSpinOnIdle or +CmiSleepOnIdle to control this directly.
WARNING: +p1024 is a command line argument beginning with a '+' but was not parsed by the RTS.
If any of the above arguments were intended for the RTS you may need to recompile Charm++ with different options.
…
Charm++> static SMSG
Charm++> SMSG memory: 5056.0KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 2048K
Charm++> Running in SMP mode: numNodes 1024, 1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.7.0-281-g8d5cdd9
Warning> using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 64 unique compute nodes (16-way SMP).
Charm++> Warning: the number of SMP threads (32) is greater than the number of physical cores (16), so threads will sleep while idling. Use +\
CmiSpinOnIdle or +CmiSleepOnIdle to control this directly.
WARNING: +p1024 is a command line argument beginning with a '+' but was not parsed by the RTS.
If any of the above arguments were intended for the RTS you may need to recompile Charm++ with different options.
…
I’m running using:
aprun -n 1024 -N 16 ./charm_app +p1024
and charm is built as: ./build charm++ gni-crayxe smp -j16 --with-production
If I add the +ppn16 (or 15 or less) to the charm_app the number of SMP threads multiply by that factor, so I don’t know how to remove that Warning (the number of SMP…).
By the way if I run some stats I see something like the following:
Proc 0: [11 created, 11 processed]
Proc 1: [0 created, 0 processed]
Proc 2: [0 created, 0 processed]
Proc 3: [0 created, 0 processed]
… all the others 0,0
Charm Kernel Detailed Statistics (R=requested P=processed):
Create Mesgs Create Mesgs Create Mesgs
Chare for Group for Nodegroup for
PE R/P Mesgs Chares Mesgs Groups Mesgs Nodegroups
---- --- --------- --------- --------- --------- --------- ----------
0 R 11 0 14 1 8 1024
P 11 7732 14 2 8 0
1 R 0 0 0 1 0 0
P 0 0 14 2 0 1
2 R 0 0 0 2 0 0
P 0 0 14 3 0 0
3 R 0 0 0 2 0 0
P 0 0 14 3 0 0
Create Mesgs Create Mesgs Create Mesgs
Chare for Group for Nodegroup for
PE R/P Mesgs Chares Mesgs Groups Mesgs Nodegroups
---- --- --------- --------- --------- --------- --------- ----------
0 R 11 0 14 1 8 1024
P 11 7732 14 2 8 0
1 R 0 0 0 1 0 0
P 0 0 14 2 0 1
2 R 0 0 0 2 0 0
P 0 0 14 3 0 0
3 R 0 0 0 2 0 0
P 0 0 14 3 0 0
… all the others like PE 1,2,3…
Is the chare 0 processing all the messages? Why? This does not look scalable.
Infact when I go over 120K chares it crashes with segfault (_pmiu_daemon(SIGCHLD): [NID 16939] [c5-0c2s5n1] [Mon Aug 1 03:12:58 2016] PE RANK 975 exit signal Segmentation fault).
Am I building or running improperly?
How can I make sure that the chares are spread on more nodes and procs in order to avoid crazy memory allocation on a few nodes?
Is there any strong coupling between the chare who creates a chare array and their actual execution nodes/procs? If I create more (smaller) chare arrays in the main chare at different execution times, instead of one large at the beginning, could it change anything?
Thank you,
Steve
- [charm] Scalability issues using large chare array, Steve Petruzza, 08/01/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/01/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/01/2016
- Re: [charm] Scalability issues using large chare array, Steve Petruzza, 08/02/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/03/2016
- Re: [charm] Scalability issues using large chare array, Steve Petruzza, 08/04/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/04/2016
- Re: [charm] Scalability issues using large chare array, Steve Petruzza, 08/08/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/04/2016
- Re: [charm] Scalability issues using large chare array, Steve Petruzza, 08/04/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/03/2016
- Re: [charm] Scalability issues using large chare array, Steve Petruzza, 08/02/2016
Archive powered by MHonArc 2.6.16.