charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Sam White <samt.white AT gmail.com>
- To: Leonardo Duarte <leo.duarte AT gmail.com>
- Cc: Scott Field <sfield AT astro.cornell.edu>, Charm Mailing List <charm AT cs.illinois.edu>
- Subject: Re: [charm] Using Charm AMPI
- Date: Thu, 29 Oct 2015 12:31:56 -0500
Hi Leonardo,
This looks like a problem related to your run commands rather than a compiler problem. AMPI will run with both CCE and GCC the performance should generally be similar for similar levels of optimization (like any application).
This looks like a problem related to your run commands rather than a compiler problem. AMPI will run with both CCE and GCC the performance should generally be similar for similar levels of optimization (like any application).
If I load the craype-hugepages8M and PrgEnv-X (either cray or gnu) modules, then build AMPI with:
./build AMPI gni-crayxe smp -j8 --with-production -g
Then I can run an AMPI program 'pgm' on 2 nodes in SMP mode with 1 thread per core (16 cores/node, ignoring the hyperthreads), you would do:
aprun -n 2 ./pgm +ppn 15 +pemap 1-15 +commap 0 +vp 120
The +ppn and such options are necessary to tell AMPI (and Charm++'s runtime system on top of which AMPI runs) how many threads/processes to create. AMPI does take these options and you will need them to specify the number of threads per OS process to run with in SMP mode. The above command creates 2 OS processes with 15 worker threads per process and 1 dedicated communication thread per OS process. These threads each have 4 virtual processors (120 MPI ranks total).
Let me know if you have any other questions, and if this doesn't help, build Charm++ and your application with the '-g' option and include a stack trace of the failure if possible.
- Sam
./build AMPI gni-crayxe smp -j8 --with-production -g
Then I can run an AMPI program 'pgm' on 2 nodes in SMP mode with 1 thread per core (16 cores/node, ignoring the hyperthreads), you would do:
aprun -n 2 ./pgm +ppn 15 +pemap 1-15 +commap 0 +vp 120
The +ppn and such options are necessary to tell AMPI (and Charm++'s runtime system on top of which AMPI runs) how many threads/processes to create. AMPI does take these options and you will need them to specify the number of threads per OS process to run with in SMP mode. The above command creates 2 OS processes with 15 worker threads per process and 1 dedicated communication thread per OS process. These threads each have 4 virtual processors (120 MPI ranks total).
Let me know if you have any other questions, and if this doesn't help, build Charm++ and your application with the '-g' option and include a stack trace of the failure if possible.
- Sam
On Thu, Oct 29, 2015 at 12:14 PM, Leonardo Duarte <leo.duarte AT gmail.com> wrote:
Hello Scott, thanks for your help.
I also used the swap to PrgEnv-gnu, the hugepages8M, rca, and the persistent to build charm. It worked but it was extremely slow.A simple example runs in secs in my laptop with 2 processors (simulating 2 nodes) and runs in 10 min in 2 nodes of BW.Of course that I was expecting it to be slower but not this much.That's why I decided to use the PrgEnv-cray environment, it's the native language.
The AMPI does not support +ppn 30. It takes from the aprun parameters.
My startup line with only 2 nodes and only 1 worker threads per process is not wrong.Since I was having trouble to run it, I simplified the example to understand better what was going on.
However, it's good to know that your application uses PrgEnv-gnu.I was worried that mine was too slow because I was using it, or because I was missing something to build it.
I really want to make it work with PrgEnv-cray right now, but I don't know what I'm doing wrong.
Thanks for your answer!Leonardo.
On Thu, Oct 29, 2015 at 11:00 AM, Scott Field <sfield AT astro.cornell.edu> wrote:
Hi Leonardo,
I have a charm++ application running on blue waters, and hopefully some of this will carry over to AMPI.
In addition to the default blue waters environment, I use
module swap PrgEnv-cray PrgEnv-gnu/5.2.40module load craype-hugepages2Mmodule load rca
and my charm++ build includes the option "persistent". To launch the application I do
>>> aprun -n 2 -r 1 -N 1 -d 31 ./ExecutableName +ppn 30 +pemap 1-30 +commap 0
On startup, my charm++ output looks different from yours. In particular, I see
"Charm++> Running in SMP mode: numNodes 2, 30 worker threads per process"
while yours reads
"Charm++> Running in SMP mode: numNodes 2, 1 worker threads per process"
These differences may or may not explain the errors you see. Hopefully it helps. Good luck!
Scott
On Thu, Oct 29, 2015 at 1:58 AM, Leonardo Duarte <leo.duarte AT gmail.com> wrote:
Hello Everyone,
I'm a PhD student at the CEE department of UIUC and I wouldreally appreciate if anyone could help me with Charm.
I'm trying to run my code on Blue Waters and I'm using a library that uses Charm++ AMPI.I was able to build and run everything correctly but extremely slow with PrgEnv-gnu.Now I'm trying to use the native Cray environment.
I'm using this BW environment and modules:
PrgEnv-craymodule load craype-hugepages8Mmodule load rca
I built charm with this command line:
./build LIBS gni-crayxe craycc smp -j16 --with-production --build-shared -O3
My code is composed by a lot of shared libraries that are loaded dynamically by the application using dlopen, dlsym and etc.
I'm able to build my code using this command lines on my makefiles:
To compile code that do not use Charm:CC -c -fPIC -O2 -I../../core/include -I../../tecgraf/tops/include -o ../../obj/obj64/linear/Linux3/linear.o ../../plugins/behavior/linear/linear.cpp
To link code that do not use Charm:
CC -shared -Wl,-soname,liblinear.so.1 -o liblinear.so.1.0 ../../obj/obj64/linear/Linux3/linear.o -L../../tecgraf/tops/lib64/Linux3 -ltops -L../../bin/lib64/Linux3 -ltopsim
To compile code that uses Charm:charmc -language model -c -fPIC -O2 -I../../core/include -I../../tecgraf/tops/include -I../../tecgraf/tops/include/vis -I../../../bin/charm/include -o ../../obj/obj64/parebepcg/Linux3/parebepcg.o ../../plugins/linearsystem/ebepcg/parebepcg.cpp
To link code that uses Charm:charmc -shared -language ampi -Wl,-soname,libparebepcg.so.1 -o libparebepcg.so.1.0 ../../obj/obj64/parebepcg/Linux3/parebepcg.o -L../../tecgraf/tops/lib64/Linux3 -lpartops -ltopsrd -ltops -L../../bin/lib64/Linux3 -lpartopsim
To compile my app:charmc -language model -c -fPIC -O2 -I../../core/include -I../../tecgraf/tops/include -I../../tecgraf/tops/include/vis -I../../plugins -o ../../obj/obj64/partopsimapp/partopsimapp/Linux3/parmain.o ../../tests/app/parmain.cpp
To link my app:charmc -language ampi -dynamic -o ../../bin/lib64/Linux3/partopsimapp ../../obj/obj64/partopsimapp/partopsimapp/Linux3/parmain.o -L../../tecgraf/tops/lib64/Linux3 -lpartops -ltopsrd -ltops -L../../bin/lib64/Linux3 -lpartopsim -lpartopsimlib -Wl, --no-as-needed -ldl
This is the error that I get:
_pmiu_daemon(SIGCHLD): [NID 16828] [c19-9c1s1n0] [Thu Oct 29 00:35:04 2015] PE RANK 0 exit signal Segmentation fault[NID 16828] 2015-10-29 00:35:04 Apid 28607883: initiated application termination_pmiu_daemon(SIGCHLD): [NID 16829] [c19-9c1s1n1] [Thu Oct 29 00:35:04 2015] PE RANK 1 exit signal Segmentation fault
I put some extra infos at the end of the email if you need.I read a lot of things on the internet and I've been trying a lot but know I think I need some help.Am I missing something? Is this the correct way handle it?I really appreciate any suggestions.
Thank you.Leonardo.
Extra infos
These are my environment variables:
echo $PATH.:/u/psp/duarte/bin/lua5:/u/psp/duarte/bin/tolua5:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/bin:/u/psp/duarte/bin/charm/gni-crayxe-persistent-smp/bin:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/bin:/sw/admin/scripts:/sw/user/scripts:/sw/xe/altd/bin:/usr/local/gsi-openssh-6.2p2-2/bin:/opt/java/jdk1.7.0_45/bin:/usr/local/globus-5.2.4/bin:/usr/local/globus-5.2.4/sbin:/opt/moab/8.1/bin:/opt/moab/8.1/sbin:/opt/torque/5.0.2-bwpatch/sbin:/opt/torque/5.0.2-bwpatch/bin:/opt/cray/mpt/7.2.0/gni/bin:/opt/cray/rca/1.0.0-2.0502.53711.3.125.gem/bin:/opt/cray/alps/5.2.1-2.0502.9041.11.6.gem/sbin:/opt/cray/alps/5.2.1-2.0502.9041.11.6.gem/bin:/opt/cray/dvs/2.5_0.9.0-1.0502.1873.1.142.gem/bin:/opt/cray/xpmem/0.1-2.0502.55507.3.2.gem/bin:/opt/cray/dmapp/7.0.1-1.0502.9501.5.211.gem/bin:/opt/cray/pmi/5.0.6-1.0000.10439.140.3.gem/bin:/opt/cray/ugni/5.0-1.0502.9685.4.24.gem/bin:/opt/cray/udreg/2.3.2-1.0502.9275.1.25.gem/bin:/opt/cray/cce/8.3.10/cray-binutils/x86_64-unknown-linux-gnu/bin:/opt/cray/cce/8.3.10/craylibs/x86-64/bin:/opt/cray/cce/8.3.10/cftn/bin:/opt/cray/cce/8.3.10/CC/bin:/opt/cray/craype/2.3.0/bin:/opt/cray/eslogin/eswrap/1.1.0-1.020200.1231.0/bin:/opt/modules/3.2.10.3/bin:/u/psp/duarte/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/opt/cray/bin
echo $LD_LIBRARY_PATH.:/u/psp/duarte/topsim/bin/lib64/Linux3:/u/psp/duarte/topsim/bin/libd64/Linux3:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/lib_so:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/lib:/u/psp/duarte/bin/charm/gni-crayxe-persistent-smp/lib:/u/psp/duarte/lib:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/lib:/usr/local/globus-5.2.4/lib64:/usr/local/globus/lib64
My app output:
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MBCharm++> only comm thread send/recv messagesCharm++> Cray TLB page size: 8192KCharm++> Running in SMP mode: numNodes 2, 1 worker threads per processCharm++> The comm. thread both sends and receives messagesConverse/Charm++ Commit ID: v6.6.1-0-g74a2cc5CharmLB> Load balancer assumes all CPUs are same.Charm++> Running on 2 unique compute nodes (32-way SMP).*** Topsim 0.1.0 ***[0] topParInit() registered[0] TopParContext created: 0![0] topParInit() array created[1] TopParContext created: 1![1] topParInit() registered[1] topParInit() array created[0] topParInit() done![1] topParInit() done![0] PARTOPS: Slave started at processor 0, node: 0, rank: 0.[0] PARTOPS: MODEL CREATED! rank: 0[1] PARTOPS: Slave started at processor 1, node: 1, rank: 0.[1] PARTOPS: MODEL CREATED! rank: 0Plugin loaded libparebepcg.soPlugin loaded libpartreader.soPlugin loaded libisotropic.soPlugin loaded liblinear.soPlugin loaded libparsimp.soPlugin loaded libbrick.soPlugin loaded libpartreader.soPlugin loaded libparebepcg.soPlugin loaded libparloadcontrol.soPlugin loaded libparwriter.soPlugin loaded libparsimp.soPlugin loaded libparjacobi.soPlugin loaded libbrick.soPlugin loaded libparwriter.soPlugin loaded liblinear.soPlugin loaded libisotropic.soPlugin loaded libparloadcontrol.soPlugin loaded libparjacobi.soApplication 28607883 exit codes: 139Application 28607883 resources: utime ~2s, stime ~2s, Rss ~15384, inblocks ~10927, outblocks ~18489Thu Oct 29 00:35:04 CDT 2015
This is my PBS script
#!/bin/bash### set the number of nodes### set the number of PEs per node#PBS -l nodes=2:ppn=1:xe### set the wallclock time#PBS -l walltime=00:20:00### set the job name#PBS -N topsim### set the job stdout and stderr#PBS -e topsim.err#PBS -o topsim.out### set email notification#PBS -m bea#PBS -M leo.duarte AT gmail.com### In case of multiple allocations, select which one to charge##PBS -A xyz
# NOTE: lines that begin with "#PBS" are not interpreted by the shell but ARE# used by the batch system, wheras lines that begin with multiple # signs,# like "##PBS" are considered "commented out" by the batch system# and have no effect.
# If you launched the job in a directory prepared for the job to run within,# you'll want to cd to that directory# [uncomment the following line to enable this]cd $PBS_O_WORKDIR
# Alternatively, the job script can create its own job-ID-unique directory# to run within. In that case you'll need to create and populate that# directory with executables and perhaps inputs# [uncomment and customize the following lines to enable this behavior]# mkdir -p /scratch/sciteam/$USER/$PBS_JOBID# cd /scratch/sciteam/$USER/$PBS_JOBID# cp /scratch/job/setup/directory/* .
# To add certain modules that you do not have added via ~/.modules. /opt/modules/default/init/bash # NEEDED to add module commands to shell#module swap PrgEnv-cray PrgEnv-gnumodule add craype-hugepages8Mmodule add rca
#export CRAY_ROOTFS=DSLecho $LD_LIBRARY_PATH
#export APRUN_XFER_LIMITS=1 # to transfer shell limits to the executable
### launch the application### redirecting stdin and stdout if needed### NOTE: (the "in" file must exist for input)
# used for timingdate
aprun -n2 -N1 ./partopsimapp ../../../tests/data/input/config/plugins_simp_parebepcg_jacobi_brick.lua ../../../tests/data/input/examples/CantSymm/CantSymm12_2.pos ../../../tests/data/output/CantSymm12_2_result.pos
# used for timingdate### For more information see the man page for aprun
- [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] Using Charm AMPI, Scott Field, 10/29/2015
- Re: [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/29/2015
- Message not available
- Re: [charm] Using Charm AMPI, Sam White, 10/29/2015
- Re: [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Leonardo Duarte, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Scott Field, 10/30/2015
- Message not available
- Re: [charm] [ppl] Using Charm AMPI, Sam White, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Phil Miller, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Sam White, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Leonardo Duarte, 10/30/2015
- Re: [charm] Using Charm AMPI, Scott Field, 10/29/2015
Archive powered by MHonArc 2.6.16.