charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Jim Phillips <jim AT ks.uiuc.edu>
- To: Leonardo Duarte <leo.duarte AT gmail.com>
- Cc: Scott Field <sfield AT astro.cornell.edu>, Charm Mailing List <charm AT cs.illinois.edu>
- Subject: Re: [charm] [ppl] Using Charm AMPI
- Date: Thu, 29 Oct 2015 16:14:51 -0500 (CDT)
Be sure you are explicitly setting +commap and +pemap. If you don't do this you can end up with all of your threads on the same core.
I also recommend PrgEnv-gnu, as the underlying gcc compilers have thousands of times the number of users as the Cray compilers. The Cray Fortran compiler is no doubt decades ahead of gnu, but not C++, and I wouldn't be surprised if the Cray malloc makes assumptions that destroy Charm++ performance.
Jim
On Thu, 29 Oct 2015, Leonardo Duarte wrote:
Hello Scott, thanks for your help.cr
I also used the swap to PrgEnv-gnu, the hugepages8M, rca, and the
persistent to build charm. It worked but it was extremely slow.
A simple example runs in secs in my laptop with 2 processors (simulating 2
nodes) and runs in 10 min in 2 nodes of BW.
Of course that I was expecting it to be slower but not this much.
That's why I decided to use the PrgEnv-cray environment, it's the native
language.
The AMPI does not support +ppn 30. It takes from the aprun parameters.
My startup line with only 2 nodes and only 1 worker threads per process is
not wrong.
Since I was having trouble to run it, I simplified the example to
understand better what was going on.
However, it's good to know that your application uses PrgEnv-gnu.
I was worried that mine was too slow because I was using it, or because I
was missing something to build it.
I really want to make it work with PrgEnv-cray right now, but I don't know
what I'm doing wrong.
Thanks for your answer!
Leonardo.
On Thu, Oct 29, 2015 at 11:00 AM, Scott Field
<sfield AT astro.cornell.edu>
wrote:
Hi Leonardo,
I have a charm++ application running on blue waters, and hopefully some
of this will carry over to AMPI.
In addition to the default blue waters environment, I use
module swap PrgEnv-cray PrgEnv-gnu/5.2.40
module load craype-hugepages2M
module load rca
and my charm++ build includes the option "persistent". To launch the
application I do
+commap 0aprun -n 2 -r 1 -N 1 -d 31 ./ExecutableName +ppn 30 +pemap 1-30
On startup, my charm++ output looks different from yours. In particular, I
see
"Charm++> Running in SMP mode: numNodes 2, 30 worker threads per process"
while yours reads
"*Charm++> Running in SMP mode: numNodes 2, 1 worker threads per
process"*
These differences may or may not explain the errors you see. Hopefully it
helps. Good luck!
Scott
On Thu, Oct 29, 2015 at 1:58 AM, Leonardo Duarte
<leo.duarte AT gmail.com>
wrote:
Hello Everyone,
I'm a PhD student at the CEE department of UIUC and I would
really appreciate if anyone could help me with Charm.
I'm trying to run my code on Blue Waters and I'm using a library that
uses Charm++ AMPI.
I was able to build and run everything correctly but extremely slow with
PrgEnv-gnu.
Now I'm trying to use the native Cray environment.
I'm using this BW environment and modules:
*PrgEnv-cray*
*module load craype-hugepages8M*
*module load rca*
I built charm with this command line:
*./build LIBS gni-crayxe craycc smp -j16 --with-production
--build-shared -O3*
My code is composed by a lot of shared libraries that are loaded
dynamically by the application using dlopen, dlsym and etc.
I'm able to build my code using this command lines on my makefiles:
To compile code that do not use Charm:
*CC -c -fPIC -O2 -I../../core/include -I../../tecgraf/tops/include -o
../../obj/obj64/linear/Linux3/linear.o
../../plugins/behavior/linear/linear.cpp*
To link code that do not use Charm:
*CC -shared -Wl,-soname,liblinear.so.1 -o liblinear.so.1.0
../../obj/obj64/linear/Linux3/linear.o -L../../tecgraf/tops/lib64/Linux3
-ltops -L../../bin/lib64/Linux3 -ltopsim*
To compile code that uses Charm:
*charmc -language model -c -fPIC -O2 -I../../core/include
-I../../tecgraf/tops/include -I../../tecgraf/tops/include/vis
-I../../../bin/charm/include -o
../../obj/obj64/parebepcg/Linux3/parebepcg.o
../../plugins/linearsystem/ebepcg/parebepcg.cpp*
To link code that uses Charm:
*charmc -shared -language ampi -Wl,-soname,libparebepcg.so.1 -o
libparebepcg.so.1.0 ../../obj/obj64/parebepcg/Linux3/parebepcg.o
-L../../tecgraf/tops/lib64/Linux3 -lpartops -ltopsrd -ltops
-L../../bin/lib64/Linux3 -lpartopsim*
To compile my app:
*charmc -language model -c -fPIC -O2 -I../../core/include
-I../../tecgraf/tops/include -I../../tecgraf/tops/include/vis
-I../../plugins -o
../../obj/obj64/partopsimapp/partopsimapp/Linux3/parmain.o
../../tests/app/parmain.cpp*
To link my app:
*charmc -language ampi -dynamic -o ../../bin/lib64/Linux3/partopsimapp
../../obj/obj64/partopsimapp/partopsimapp/Linux3/parmain.o
-L../../tecgraf/tops/lib64/Linux3 -lpartops -ltopsrd -ltops
-L../../bin/lib64/Linux3 -lpartopsim -lpartopsimlib -Wl, --no-as-needed
-ldl*
This is the error that I get:
*_pmiu_daemon(SIGCHLD): [NID 16828] [c19-9c1s1n0] [Thu Oct 29 00:35:04
2015] PE RANK 0 exit signal Segmentation fault*
*[NID 16828] 2015-10-29 00:35:04 Apid 28607883: initiated application
termination*
*_pmiu_daemon(SIGCHLD): [NID 16829] [c19-9c1s1n1] [Thu Oct 29 00:35:04
2015] PE RANK 1 exit signal Segmentation fault*
I put some extra infos at the end of the email if you need.
I read a lot of things on the internet and I've been trying a lot but
know I think I need some help.
Am I missing something? Is this the correct way handle it?
I really appreciate any suggestions.
Thank you.
Leonardo.
Extra infos
These are my environment variables:
echo $PATH
*.:/u/psp/duarte/bin/lua5:/u/psp/duarte/bin/tolua5:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/bin:/u/psp/duarte/bin/charm/gni-crayxe-persistent-smp/bin:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/bin:/sw/admin/scripts:/sw/user/scripts:/sw/xe/altd/bin:/usr/local/gsi-openssh-6.2p2-2/bin:/opt/java/jdk1.7.0_45/bin:/usr/local/globus-5.2.4/bin:/usr/local/globus-5.2.4/sbin:/opt/moab/8.1/bin:/opt/moab/8.1/sbin:/opt/torque/5.0.2-bwpatch/sbin:/opt/torque/5.0.2-bwpatch/bin:/opt/cray/mpt/7.2.0/gni/bin:/opt/cray/rca/1.0.0-2.0502.53711.3.125.gem/bin:/opt/cray/alps/5.2.1-2.0502.9041.11.6.gem/sbin:/opt/cray/alps/5.2.1-2.0502.9041.11.6.gem/bin:/opt/cray/dvs/2.5_0.9.0-1.0502.1873.1.142.gem/bin:/opt/cray/xpmem/0.1-2.0502.55507.3.2.gem/bin:/opt/cray/dmapp/7.0.1-1.0502.9501.5.211.gem/bin:/opt/cray/pmi/5.0.6-1.0000.10439.140.3.gem/bin:/opt/cray/ugni/5.0-1.0502.9685.4.24.gem/bin:/opt/cray/udreg/2.3.2-1.0502.9275.1.25.gem/bin:/opt/cray/cce/8.3.10/cray-binutils/x86_64-unknown-linux-gnu/bin:/opt/!
ay/cce/8.3.10/craylibs/x86-64/bin:/opt/cray/cce/8.3.10/cftn/bin:/opt/cray/cce/8.3.10/CC/bin:/opt/cray/craype/2.3.0/bin:/opt/cray/eslogin/eswrap/1.1.0-1.020200.1231.0/bin:/opt/modules/3.2.10.3/bin:/u/psp/duarte/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/opt/cray/bin
<https://urldefense.proofpoint.com/v2/url?u=http-3A__3.2.10.3_bin-3A_u_psp_duarte_bin-3A_usr_local_bin-3A_usr_bin-3A_bin-3A_usr_bin_X11-3A_usr_X11R6_bin-3A_usr_games-3A_usr_lib_mit_bin-3A_usr_lib_mit_sbin-3A_usr_lib_qt3_bin-3A_opt_cray_bin&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=x3NNBo1sW-0Ro900LIARhw_4yZMfh7AfgFTrqQHfc5M&m=yfIBSUZgI5UZf-g-INH-575Zn6hB6aTHomfsZcaiw0E&s=b5T-hpgum8IgkefJGD6l8AO8ORe9UOXiLb4mYaTGNTA&e=>*
echo $LD_LIBRARY_PATH
*.:/u/psp/duarte/topsim/bin/lib64/Linux3:/u/psp/duarte/topsim/bin/libd64/Linux3:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/lib_so:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/lib:/u/psp/duarte/bin/charm/gni-crayxe-persistent-smp/lib:/u/psp/duarte/lib:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/lib:/usr/local/globus-5.2.4/lib64:/usr/local/globus/lib64*
My app output:
*Charm++> memory pool registered memory limit: 200000MB, send limit:
100000MB*
*Charm++> only comm thread send/recv messages*
*Charm++> Cray TLB page size: 8192K*
*Charm++> Running in SMP mode: numNodes 2, 1 worker threads per process*
*Charm++> The comm. thread both sends and receives messages*
*Converse/Charm++ Commit ID: v6.6.1-0-g74a2cc5*
*CharmLB> Load balancer assumes all CPUs are same.*
*Charm++> Running on 2 unique compute nodes (32-way SMP).*
**** Topsim 0.1.0 ****
*[0] topParInit() registered*
*[0] TopParContext created: 0!*
*[0] topParInit() array created*
*[1] TopParContext created: 1!*
*[1] topParInit() registered*
*[1] topParInit() array created*
*[0] topParInit() done!*
*[1] topParInit() done!*
*[0] PARTOPS: Slave started at processor 0, node: 0, rank: 0.*
*[0] PARTOPS: MODEL CREATED! rank: 0*
*[1] PARTOPS: Slave started at processor 1, node: 1, rank: 0.*
*[1] PARTOPS: MODEL CREATED! rank: 0*
*Plugin loaded libparebepcg.so*
*Plugin loaded libpartreader.so*
*Plugin loaded libisotropic.so*
*Plugin loaded liblinear.so*
*Plugin loaded libparsimp.so*
*Plugin loaded libbrick.so*
*Plugin loaded libpartreader.so*
*Plugin loaded libparebepcg.so*
*Plugin loaded libparloadcontrol.so*
*Plugin loaded libparwriter.so*
*Plugin loaded libparsimp.so*
*Plugin loaded libparjacobi.so*
*Plugin loaded libbrick.so*
*Plugin loaded libparwriter.so*
*Plugin loaded liblinear.so*
*Plugin loaded libisotropic.so*
*Plugin loaded libparloadcontrol.so*
*Plugin loaded libparjacobi.so*
*Application 28607883 exit codes: 139*
*Application 28607883 resources: utime ~2s, stime ~2s, Rss ~15384,
inblocks ~10927, outblocks ~18489*
*Thu Oct 29 00:35:04 CDT 2015*
This is my PBS script
#!/bin/bash
### set the number of nodes
### set the number of PEs per node
#PBS -l nodes=2:ppn=1:xe
### set the wallclock time
#PBS -l walltime=00:20:00
### set the job name
#PBS -N topsim
### set the job stdout and stderr
#PBS -e topsim.err
#PBS -o topsim.out
### set email notification
#PBS -m bea
#PBS -M
leo.duarte AT gmail.com
### In case of multiple allocations, select which one to charge
##PBS -A xyz
# NOTE: lines that begin with "#PBS" are not interpreted by the shell but
ARE
# used by the batch system, wheras lines that begin with multiple # signs,
# like "##PBS" are considered "commented out" by the batch system
# and have no effect.
# If you launched the job in a directory prepared for the job to run
within,
# you'll want to cd to that directory
# [uncomment the following line to enable this]
cd $PBS_O_WORKDIR
# Alternatively, the job script can create its own job-ID-unique directory
# to run within. In that case you'll need to create and populate that
# directory with executables and perhaps inputs
# [uncomment and customize the following lines to enable this behavior]
# mkdir -p /scratch/sciteam/$USER/$PBS_JOBID
# cd /scratch/sciteam/$USER/$PBS_JOBID
# cp /scratch/job/setup/directory/* .
# To add certain modules that you do not have added via ~/.modules
. /opt/modules/default/init/bash # NEEDED to add module commands to shell
#module swap PrgEnv-cray PrgEnv-gnu
module add craype-hugepages8M
module add rca
#export CRAY_ROOTFS=DSL
echo $LD_LIBRARY_PATH
#export APRUN_XFER_LIMITS=1 # to transfer shell limits to the executable
### launch the application
### redirecting stdin and stdout if needed
### NOTE: (the "in" file must exist for input)
# used for timing
date
aprun -n2 -N1 ./partopsimapp
../../../tests/data/input/config/plugins_simp_parebepcg_jacobi_brick.lua
../../../tests/data/input/examples/CantSymm/CantSymm12_2.pos
../../../tests/data/output/CantSymm12_2_result.pos
# used for timing
date
### For more information see the man page for aprun
- [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] Using Charm AMPI, Scott Field, 10/29/2015
- Re: [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/29/2015
- Message not available
- Re: [charm] Using Charm AMPI, Sam White, 10/29/2015
- Re: [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Leonardo Duarte, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Scott Field, 10/30/2015
- Message not available
- Re: [charm] [ppl] Using Charm AMPI, Sam White, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Phil Miller, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Sam White, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Leonardo Duarte, 10/30/2015
- Re: [charm] Using Charm AMPI, Scott Field, 10/29/2015
Archive powered by MHonArc 2.6.16.