charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Phil Miller <mille121 AT illinois.edu>
- To: Steve Petruzza <spetruzza AT sci.utah.edu>
- Cc: charm <charm AT lists.cs.illinois.edu>
- Subject: Re: [charm] Scalability issues using large chare array
- Date: Mon, 1 Aug 2016 12:04:11 -0500
Hi Steve,
I'm going to address your message in two separate parts, because they deal with very different issues.
On Mon, Aug 1, 2016 at 7:44 AM, Steve Petruzza <spetruzza AT sci.utah.edu> wrote:
If I run on 1024 cores I get the following at the startup:Charm++> Running on Gemini (GNI) with 1024 processes
Charm++> static SMSG
Charm++> SMSG memory: 5056.0KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 2048K
Charm++> Running in SMP mode: numNodes 1024, 1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.7.0-281-g8d5cdd9
Warning> using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 64 unique compute nodes (16-way SMP).
Charm++> Warning: the number of SMP threads (32) is greater than the number of physical cores (16), so threads will sleep while idling. Use +\
CmiSpinOnIdle or +CmiSleepOnIdle to control this directly.
WARNING: +p1024 is a command line argument beginning with a '+' but was not parsed by the RTS.
If any of the above arguments were intended for the RTS you may need to recompile Charm++ with different options.
…I’m running using:aprun -n 1024 -N 16 ./charm_app +p1024and charm is built as: ./build charm++ gni-crayxe smp -j16 --with-productionIf I add the +ppn16 (or 15 or less) to the charm_app the number of SMP threads multiply by that factor, so I don’t know how to remove that Warning (the number of SMP…).
On Cray systems, the -n argument to the aprun command indicates how many processes the system should launch. This has a few implications, that all play into your observations
- Because aprun is doing the process launching rather than the charmrun utility that we'd use on a commodity cluster, there is nothing to process the +p argument. It would be meaningless in this context.
- An 'smp' build of Charm++ runs at least two threads in each process: a communication thread, and one or more worker threads. So, aprun launches 16 processes on each 16-core node, and each of those processes has two threads
- The +ppn argument sets the number of worker threads to spawn per process, and so multiplies the oversubscription as you noted
What you want is for aprun to launch a smaller number of processes (e.g. 1 or 2 per node) and for the Charm RTS to spawn threads on each core within the bounds of those processes. Here's what that would look like -
One process per node:
One process per node:
aprun -n 64 -N 1 ./charm_app +ppn15
Two processes per node:
aprun -n 128 -N 2 ./charm_app +ppn7
Very alternately, if your application would rather use every core for computation, without a dedicated communication thread or the benefit of shared-memory communication within each node, you could build without the smp option, and then run almost as you did initially:
aprun -n 1024 -N 16 ./charm_app
We're working on code to automate a lot of this process and thread launching tedium in a near-future release. I don't think we'll be able to soundly and automatically back down from explicitly commanded oversubscription, though.
Phil
- [charm] Scalability issues using large chare array, Steve Petruzza, 08/01/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/01/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/01/2016
- Re: [charm] Scalability issues using large chare array, Steve Petruzza, 08/02/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/03/2016
- Re: [charm] Scalability issues using large chare array, Steve Petruzza, 08/04/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/04/2016
- Re: [charm] Scalability issues using large chare array, Steve Petruzza, 08/08/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/04/2016
- Re: [charm] Scalability issues using large chare array, Steve Petruzza, 08/04/2016
- Re: [charm] Scalability issues using large chare array, Phil Miller, 08/03/2016
- Re: [charm] Scalability issues using large chare array, Steve Petruzza, 08/02/2016
Archive powered by MHonArc 2.6.16.