charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Scott Field <sfield AT astro.cornell.edu>
- To: Leonardo Duarte <leo.duarte AT gmail.com>
- Cc: Charm Mailing List <charm AT cs.illinois.edu>
- Subject: Re: [charm] Using Charm AMPI
- Date: Thu, 29 Oct 2015 12:00:41 -0400
Hi Leonardo,
I have a charm++ application running on blue waters, and hopefully some of this will carry over to AMPI.
In addition to the default blue waters environment, I use
module swap PrgEnv-cray PrgEnv-gnu/5.2.40
module load craype-hugepages2M
module load rca
and my charm++ build includes the option "persistent". To launch the application I do
>>> aprun -n 2 -r 1 -N 1 -d 31 ./ExecutableName +ppn 30 +pemap 1-30 +commap 0
On startup, my charm++ output looks different from yours. In particular, I see
"Charm++> Running in SMP mode: numNodes 2, 30 worker threads per process"
while yours reads
"Charm++> Running in SMP mode: numNodes 2, 1 worker threads per process"
These differences may or may not explain the errors you see. Hopefully it helps. Good luck!
Scott
On Thu, Oct 29, 2015 at 1:58 AM, Leonardo Duarte <leo.duarte AT gmail.com> wrote:
Hello Everyone,I'm a PhD student at the CEE department of UIUC and I wouldreally appreciate if anyone could help me with Charm.I'm trying to run my code on Blue Waters and I'm using a library that uses Charm++ AMPI.I was able to build and run everything correctly but extremely slow with PrgEnv-gnu.Now I'm trying to use the native Cray environment.I'm using this BW environment and modules:PrgEnv-craymodule load craype-hugepages8Mmodule load rcaI built charm with this command line:./build LIBS gni-crayxe craycc smp -j16 --with-production --build-shared -O3My code is composed by a lot of shared libraries that are loaded dynamically by the application using dlopen, dlsym and etc.I'm able to build my code using this command lines on my makefiles:To compile code that do not use Charm:CC -c -fPIC -O2 -I../../core/include -I../../tecgraf/tops/include -o ../../obj/obj64/linear/Linux3/linear.o ../../plugins/behavior/linear/linear.cppTo link code that do not use Charm:CC -shared -Wl,-soname,liblinear.so.1 -o liblinear.so.1.0 ../../obj/obj64/linear/Linux3/linear.o -L../../tecgraf/tops/lib64/Linux3 -ltops -L../../bin/lib64/Linux3 -ltopsimTo compile code that uses Charm:charmc -language model -c -fPIC -O2 -I../../core/include -I../../tecgraf/tops/include -I../../tecgraf/tops/include/vis -I../../../bin/charm/include -o ../../obj/obj64/parebepcg/Linux3/parebepcg.o ../../plugins/linearsystem/ebepcg/parebepcg.cppTo link code that uses Charm:charmc -shared -language ampi -Wl,-soname,libparebepcg.so.1 -o libparebepcg.so.1.0 ../../obj/obj64/parebepcg/Linux3/parebepcg.o -L../../tecgraf/tops/lib64/Linux3 -lpartops -ltopsrd -ltops -L../../bin/lib64/Linux3 -lpartopsimTo compile my app:charmc -language model -c -fPIC -O2 -I../../core/include -I../../tecgraf/tops/include -I../../tecgraf/tops/include/vis -I../../plugins -o ../../obj/obj64/partopsimapp/partopsimapp/Linux3/parmain.o ../../tests/app/parmain.cppTo link my app:charmc -language ampi -dynamic -o ../../bin/lib64/Linux3/partopsimapp ../../obj/obj64/partopsimapp/partopsimapp/Linux3/parmain.o -L../../tecgraf/tops/lib64/Linux3 -lpartops -ltopsrd -ltops -L../../bin/lib64/Linux3 -lpartopsim -lpartopsimlib -Wl, --no-as-needed -ldlThis is the error that I get:_pmiu_daemon(SIGCHLD): [NID 16828] [c19-9c1s1n0] [Thu Oct 29 00:35:04 2015] PE RANK 0 exit signal Segmentation fault[NID 16828] 2015-10-29 00:35:04 Apid 28607883: initiated application termination_pmiu_daemon(SIGCHLD): [NID 16829] [c19-9c1s1n1] [Thu Oct 29 00:35:04 2015] PE RANK 1 exit signal Segmentation faultI put some extra infos at the end of the email if you need.I read a lot of things on the internet and I've been trying a lot but know I think I need some help.Am I missing something? Is this the correct way handle it?I really appreciate any suggestions.Thank you.Leonardo.Extra infosThese are my environment variables:echo $PATH.:/u/psp/duarte/bin/lua5:/u/psp/duarte/bin/tolua5:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/bin:/u/psp/duarte/bin/charm/gni-crayxe-persistent-smp/bin:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/bin:/sw/admin/scripts:/sw/user/scripts:/sw/xe/altd/bin:/usr/local/gsi-openssh-6.2p2-2/bin:/opt/java/jdk1.7.0_45/bin:/usr/local/globus-5.2.4/bin:/usr/local/globus-5.2.4/sbin:/opt/moab/8.1/bin:/opt/moab/8.1/sbin:/opt/torque/5.0.2-bwpatch/sbin:/opt/torque/5.0.2-bwpatch/bin:/opt/cray/mpt/7.2.0/gni/bin:/opt/cray/rca/1.0.0-2.0502.53711.3.125.gem/bin:/opt/cray/alps/5.2.1-2.0502.9041.11.6.gem/sbin:/opt/cray/alps/5.2.1-2.0502.9041.11.6.gem/bin:/opt/cray/dvs/2.5_0.9.0-1.0502.1873.1.142.gem/bin:/opt/cray/xpmem/0.1-2.0502.55507.3.2.gem/bin:/opt/cray/dmapp/7.0.1-1.0502.9501.5.211.gem/bin:/opt/cray/pmi/5.0.6-1.0000.10439.140.3.gem/bin:/opt/cray/ugni/5.0-1.0502.9685.4.24.gem/bin:/opt/cray/udreg/2.3.2-1.0502.9275.1.25.gem/bin:/opt/cray/cce/8.3.10/cray-binutils/x86_64-unknown-linux-gnu/bin:/opt/cray/cce/8.3.10/craylibs/x86-64/bin:/opt/cray/cce/8.3.10/cftn/bin:/opt/cray/cce/8.3.10/CC/bin:/opt/cray/craype/2.3.0/bin:/opt/cray/eslogin/eswrap/1.1.0-1.020200.1231.0/bin:/opt/modules/3.2.10.3/bin:/u/psp/duarte/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/opt/cray/binecho $LD_LIBRARY_PATH.:/u/psp/duarte/topsim/bin/lib64/Linux3:/u/psp/duarte/topsim/bin/libd64/Linux3:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/lib_so:/u/psp/duarte/bin/charm/gni-crayxe-smp-craycc/lib:/u/psp/duarte/bin/charm/gni-crayxe-persistent-smp/lib:/u/psp/duarte/lib:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/lib:/usr/local/globus-5.2.4/lib64:/usr/local/globus/lib64My app output:Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MBCharm++> only comm thread send/recv messagesCharm++> Cray TLB page size: 8192KCharm++> Running in SMP mode: numNodes 2, 1 worker threads per processCharm++> The comm. thread both sends and receives messagesConverse/Charm++ Commit ID: v6.6.1-0-g74a2cc5CharmLB> Load balancer assumes all CPUs are same.Charm++> Running on 2 unique compute nodes (32-way SMP).*** Topsim 0.1.0 ***[0] topParInit() registered[0] TopParContext created: 0![0] topParInit() array created[1] TopParContext created: 1![1] topParInit() registered[1] topParInit() array created[0] topParInit() done![1] topParInit() done![0] PARTOPS: Slave started at processor 0, node: 0, rank: 0.[0] PARTOPS: MODEL CREATED! rank: 0[1] PARTOPS: Slave started at processor 1, node: 1, rank: 0.[1] PARTOPS: MODEL CREATED! rank: 0Plugin loaded libparebepcg.soPlugin loaded libpartreader.soPlugin loaded libisotropic.soPlugin loaded liblinear.soPlugin loaded libparsimp.soPlugin loaded libbrick.soPlugin loaded libpartreader.soPlugin loaded libparebepcg.soPlugin loaded libparloadcontrol.soPlugin loaded libparwriter.soPlugin loaded libparsimp.soPlugin loaded libparjacobi.soPlugin loaded libbrick.soPlugin loaded libparwriter.soPlugin loaded liblinear.soPlugin loaded libisotropic.soPlugin loaded libparloadcontrol.soPlugin loaded libparjacobi.soApplication 28607883 exit codes: 139Application 28607883 resources: utime ~2s, stime ~2s, Rss ~15384, inblocks ~10927, outblocks ~18489Thu Oct 29 00:35:04 CDT 2015This is my PBS script#!/bin/bash### set the number of nodes### set the number of PEs per node#PBS -l nodes=2:ppn=1:xe### set the wallclock time#PBS -l walltime=00:20:00### set the job name#PBS -N topsim### set the job stdout and stderr#PBS -e topsim.err#PBS -o topsim.out### set email notification#PBS -m bea#PBS -M leo.duarte AT gmail.com### In case of multiple allocations, select which one to charge##PBS -A xyz# NOTE: lines that begin with "#PBS" are not interpreted by the shell but ARE# used by the batch system, wheras lines that begin with multiple # signs,# like "##PBS" are considered "commented out" by the batch system# and have no effect.# If you launched the job in a directory prepared for the job to run within,# you'll want to cd to that directory# [uncomment the following line to enable this]cd $PBS_O_WORKDIR# Alternatively, the job script can create its own job-ID-unique directory# to run within. In that case you'll need to create and populate that# directory with executables and perhaps inputs# [uncomment and customize the following lines to enable this behavior]# mkdir -p /scratch/sciteam/$USER/$PBS_JOBID# cd /scratch/sciteam/$USER/$PBS_JOBID# cp /scratch/job/setup/directory/* .# To add certain modules that you do not have added via ~/.modules. /opt/modules/default/init/bash # NEEDED to add module commands to shell#module swap PrgEnv-cray PrgEnv-gnumodule add craype-hugepages8Mmodule add rca#export CRAY_ROOTFS=DSLecho $LD_LIBRARY_PATH#export APRUN_XFER_LIMITS=1 # to transfer shell limits to the executable### launch the application### redirecting stdin and stdout if needed### NOTE: (the "in" file must exist for input)# used for timingdateaprun -n2 -N1 ./partopsimapp ../../../tests/data/input/config/plugins_simp_parebepcg_jacobi_brick.lua ../../../tests/data/input/examples/CantSymm/CantSymm12_2.pos ../../../tests/data/output/CantSymm12_2_result.pos# used for timingdate### For more information see the man page for aprun
- [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] Using Charm AMPI, Scott Field, 10/29/2015
- Re: [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/29/2015
- Message not available
- Re: [charm] Using Charm AMPI, Sam White, 10/29/2015
- Re: [charm] Using Charm AMPI, Leonardo Duarte, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/29/2015
- Re: [charm] [ppl] Using Charm AMPI, Leonardo Duarte, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Scott Field, 10/30/2015
- Message not available
- Re: [charm] [ppl] Using Charm AMPI, Sam White, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Phil Miller, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Jim Phillips, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Sam White, 10/30/2015
- Re: [charm] [ppl] Using Charm AMPI, Leonardo Duarte, 10/30/2015
- Re: [charm] Using Charm AMPI, Scott Field, 10/29/2015
Archive powered by MHonArc 2.6.16.