charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Elliott Slaughter <slaughter AT cs.stanford.edu>
- To: "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com>
- Cc: Sam White <white67 AT illinois.edu>, Phil Miller <mille121 AT illinois.edu>, "Kale, Laxmikant V" <kale AT illinois.edu>, "Chandrasekar, Kavitha" <kchndrs2 AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: Re: [charm] Introduction
- Date: Fri, 20 Oct 2017 14:44:03 -0700
- Authentication-results: illinois.edu; spf=softfail smtp.mailfrom=slaughter AT cs.stanford.edu
To follow up on my last email, here is a mystery I can't explain. With the PRK Stencil code and the configuration from my last email, Charm++ seems to get nearly 2x the performance of MPI on a single node, even with an overdecomposition factor of 1. I'm fairly certain that I've configured the two as closely as possible. Both use Intel 17.0.4, both use -O3, same grid size, same number of PEs, etc. The problem size is really quite generous so the impact of programming model in general should be very minimal, and nearly all of the time should be in the kernels. I'm attaching some sample outputs to this email in case you can spot any differences.
Do any of you know if there are any known differences between the MPI and Charm++ stencil codes? I noticed for example that the Charm++ version doesn't respond to the DOUBLE define, but it seems to hard-coded to double-precision so I don't think it should be an issue. Otherwise I'm having a hard time seeing what could cause such a large difference at this problem size. I've worked with the MPI versions of the PRK codes for some time so I'm fairly certain I'm not mis-configuring them.On Fri, Oct 20, 2017 at 2:33 PM, Elliott Slaughter <slaughter AT cs.stanford.edu> wrote:
If anything about this configuration looks wrong, or if I'm missing any important settings (or there are settings where I should explore the performance impact of different options), please let me know.My run command looks like the following, where $n is the number of nodes and $d is the decomposition factor. The nodes have 12 physical cores per node, so this leaves 2 extra cores for whatever extra threads Charm++ wants to use. The stencil code is memory bound so I've found that even with MPI/OpenMP filling up all the cores isn't generally beneficial.Thanks Rob for the introduction.I mostly just wanted to sanity check my configuration to make sure I'm doing things the Right Way (tm).I downloaded Charm++ 6.8.1 and built with the following command. This is on Piz Daint, a Cray XC40/50 system.module load PrgEnv-intel # and unload any other PrgEnv-*
module load craype-hugepages8M
./build charm++ gni-crayxc smp --with-production -j8I wasn't sure about the SMP part, but Rob had talked about Charm++ having a dedicated core for communication, and I think this is the setting I need to get that configuration.I set CHARMTOP inside PRK's make.defs file, but otherwise left the settings the same as the other apps. (I.e. -O3 and so on.)
srun -n $n -N $n --ntasks-per-node 1 --cpu_bind none stencil +ppn 10 +setcpuaffinity 100 40000 $d--On Fri, Oct 20, 2017 at 1:56 PM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:Hello Team,
I wanted to introduce you to Elliott Slaughter, a freshly minted PhD in computer science from Stanford, and member of the Legion team. He had some questions for me about optimal choice of configuration, compiler, and runtime parameters when building Charm++ and executing Charm++ workloads, especially the Parallel Research Kernels. I gave some generic advice, but would like to ask you (or those of you who are still at UIUC) to help him optimize his execution environment. Thanks!
Rob
Elliott Slaughter
"Don't worry about what anybody else is going to do. The best way to predict the future is to invent it." - Alan Kay
--
Elliott Slaughter
"Don't worry about what anybody else is going to do. The best way to predict the future is to invent it." - Alan Kay
"Don't worry about what anybody else is going to do. The best way to predict the future is to invent it." - Alan Kay
Parallel Research Kernels version 2.17 MPI stencil execution on 2D grid Number of ranks = 10 Grid size = 40000 Radius of stencil = 2 Tiles in x/y-direction = 2/5 Type of stencil = star Data type = double precision Compact representation of stencil loop body Number of iterations = 100 Solution validates Rate (MFlops/s): 25690.953422 Avg time (s): 1.183059
Charm++> Running on Gemini (GNI) with 1 processes Charm++> static SMSG Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit) Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB Charm++> only comm thread send/recv messages Charm++> Cray TLB page size: 8192K Charm++> Running in SMP mode: numNodes 1, 10 worker threads per process Charm++> The comm. thread both sends and receives messages Converse/Charm++ Commit ID: v6.8.1 CharmLB> Load balancer assumes all CPUs are same. Charm++> cpu affinity enabled. Charm++> Running on 1 unique compute nodes (24-way SMP). Parallel Research Kernels Version 2.17 Charm++ stencil execution on 2D grid Number of Charm++ PEs = 10 Overdecomposition = 1 Grid size = 40000 Radius of stencil = 2 Chares in x/y-direction = 2/5 Type of stencil = star Compact representation of stencil loop body Number of iterations = 100 Solution validates Rate (MFlops): 44370.295587 Avg time (s) 0.685006 [Partition 0][Node 0] End of program
- [charm] Introduction, Van Der Wijngaart, Rob F, 10/20/2017
- Re: [charm] Introduction, Elliott Slaughter, 10/20/2017
- Re: [charm] Introduction, Elliott Slaughter, 10/20/2017
- RE: [charm] Introduction, Van Der Wijngaart, Rob F, 10/20/2017
- Re: [charm] Introduction, Phil Miller, 10/20/2017
- Re: [charm] Introduction, Elliott Slaughter, 10/20/2017
- RE: [charm] Introduction, Van Der Wijngaart, Rob F, 10/20/2017
- Re: [charm] Introduction, Elliott Slaughter, 10/20/2017
- Re: [charm] Introduction, Phil Miller, 10/20/2017
- RE: [charm] Introduction, Van Der Wijngaart, Rob F, 10/20/2017
- Re: [charm] Introduction, Elliott Slaughter, 10/20/2017
- Re: [charm] Introduction, Elliott Slaughter, 10/20/2017
Archive powered by MHonArc 2.6.19.