charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: "Papatheodore, Thomas L." <papatheodore AT ornl.gov>
- To: Phil Miller <mille121 AT illinois.edu>, Ted Packwood <malice AT cray.com>
- Cc: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>, "Choi, Jaemin" <jchoi157 AT illinois.edu>
- Subject: Re: [charm] CrayPat with Charm++
- Date: Mon, 10 Jul 2017 14:26:48 +0000
- Accept-language: en-US
Hey all-
I’m still having trouble profiling the Charm++ examples on Titan with GPUs. Here are two specific cases…
My attempt to run hello program----------------------
COMPILE: [tpapathe@titan-ext3: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/hello]$ make OPTS="-save" ../../../../bin/charmc -save hello.ci ../../../../bin/charmc -save -c hello.C /opt/nvidia/cudatoolkit7.5/7.5.18-1.0502.10743.2.1/bin/nvcc -c -use_fast_math -I/usr/local/cuda/include -I../../../../include helloCUDA.cu ../../../../bin/charmc -save -language charm++ -o hello hello.o helloCUDA.o -lcuda -lcudart
INSTRUMENT: [tpapathe@titan-ext3: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/hello]$ pat_build -u -Dtrace-text-size=800 ./hello WARNING: Tracing non-group functions was limited to those 803 - 9318 bytes in size. INFO: A total of 130 selected non-group functions were traced.
RUN UN-INSTRUMENTED PROGRAM FROM INTERACTIVE COMPUTE NODE: [tpapathe@titan-login5: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/hello]$ aprun -n1 ./hello Charm++> Running on Gemini (GNI) with 1 processes Charm++> static SMSG Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit) Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB Charm++> Cray TLB page size: 2048K Charm++> Running in non-SMP mode: numPes 1 Converse/Charm++ Commit ID: v6.8.0-beta1-287-gd57c83d CharmLB> Load balancer assumes all CPUs are same. Charm++> Running on 1 unique compute nodes (16-way SMP). Running Hello on 1 processors for 5 elements Hello 0 created Hello 1 created Hello 2 created Hello 3 created Hello 4 created Hi from element 0 calling kernel Sending a Hi Message Hi from element 1 calling kernel Sending a Hi Message Hi from element 2 calling kernel Sending a Hi Message Hi from element 3 calling kernel Sending a Hi Message Hi from element 4 All done EXIT HYBRID API [Partition 0][Node 0] End of program Application 15025878 resources: utime ~0s, stime ~2s, Rss ~1142116, inblocks ~10482, outblocks ~31178
RUN INSTRUMENTED PROGRAM FROM INTERACTIVE COMPUTE NODE: [tpapathe@titan-login5: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/hello]$ aprun -n1 ./hello+pat CrayPat/X: Version 6.4.5 Revision 87dd5b8 01/23/17 15:37:24 Charm++> Running on Gemini (GNI) with 1 processes Charm++> static SMSG Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit) Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB Charm++> Cray TLB page size: 2048K Charm++> Running in non-SMP mode: numPes 1 Converse/Charm++ Commit ID: v6.8.0-beta1-287-gd57c83d CharmLB> Load balancer assumes all CPUs are same. Charm++> Running on 1 unique compute nodes (16-way SMP). pat[WARNING][0]: abort process 11933 because of signal 11 Experiment data file written: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/hello/hello+pat+11933-2351t.xf _pmiu_daemon(SIGCHLD): [NID 02351] [c6-1c0s7n3] [Mon Jul 10 10:14:49 2017] PE RANK 0 exit signal Segmentation fault Application 15025897 exit codes: 139 Application 15025897 resources: utime ~0s, stime ~1s, Rss ~137348, inblocks ~11356, outblocks ~32666
If instead, I try to run the vectorAdd program, the un-instrumented code seg faults before even trying the instrumented version:
My attempt to run vectorAdd program----------------------
COMPILE: [tpapathe@titan-ext3: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/vectorAdd]$ make OPTS="-save" ../../../../bin/charmc -save vectorAdd.ci ../../../../bin/charmc -save -O3 -c vectorAdd.C /opt/nvidia/cudatoolkit7.5/7.5.18-1.0502.10743.2.1/bin/nvcc -O3 -c -use_fast_math -DGPU_MEMPOOL -DCUDA_USE_CUDAMALLOCHOST -arch=compute_35 -code=sm_35 -I/usr/local/cuda/include -I../../../../../src/arch/cuda/hybridAPI -I../../../../include -o vectorAddCU.o vectorAdd.cu ../../../../bin/charmc -save -language charm++ -o vectorAdd vectorAdd.o vectorAddCU.o –lcublas
INSTRUMENT: [tpapathe@titan-ext3: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/vectorAdd]$ pat_build -u -Dtrace-text-size=800 ./vectorAdd WARNING: Tracing non-group functions was limited to those 803 - 9318 bytes in size. INFO: A total of 130 selected non-group functions were traced. [tpapathe@titan-ext3: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/vectorAdd]$
RUN UN-INSTRUMENTED PROGRAM FROM INTERACTIVE COMPUTE NODE: [tpapathe@titan-login5: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/vectorAdd]$ aprun -n1 ./vectorAdd+pat CrayPat/X: Version 6.4.5 Revision 87dd5b8 01/23/17 15:37:24 Charm++> Running on Gemini (GNI) with 1 processes Charm++> static SMSG Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit) Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB Charm++> Cray TLB page size: 2048K Charm++> Running in non-SMP mode: numPes 1 Converse/Charm++ Commit ID: v6.8.0-beta1-287-gd57c83d CharmLB> Load balancer assumes all CPUs are same. Charm++> Running on 1 unique compute nodes (16-way SMP). pat[WARNING][0]: abort process 11610 because of signal 11 Experiment data file written: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/vectorAdd/vectorAdd+pat+11610-2351t.xf _pmiu_daemon(SIGCHLD): [NID 02351] [c6-1c0s7n3] [Mon Jul 10 10:02:29 2017] PE RANK 0 exit signal Segmentation fault Application 15025560 exit codes: 139 Application 15025560 resources: utime ~0s, stime ~1s, Rss ~138756, inblocks ~11390, outblocks ~32755 [tpapathe@titan-login5: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/vectorAdd]$
Is there something obvious here that I am doing incorrectly? The hello example program appears to work correctly, but the CrayPat profiling on it does not. The vectorAdd example program does not appear to work correctly even without profiling. If you have any further advice, I would greatly appreciate it. Thank you for your help.
-Tom
From: "Choi, Jaemin" <jchoi157 AT illinois.edu>
Hi Tom,
First of all thank you for looking into the CrayPat issue in my stead.
To compile the CUDA hello example on Titan, you need to edit the NVCC variable in the Makefile to be $(CUDATOOLKIT_HOME)/bin/nvcc. This is a mistake on our part assuming that NVCC would reside in /usr/local/cuda.
Thank you.
Jaemin Choi Ph.D. Student, Research Assistant Parallel Programming Laboratory University of Illinois Urbana-Champaign From: Papatheodore, Thomas L. [papatheodore AT ornl.gov] Ok, it now instruments the code (correctly?):
[tpapathe@titan-ext2: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/tests/charm++/simplearrayhello]$ pat_build -O apa ./hello INFO: A maximum of 712 functions from group 'mpi' will be traced. INFO: A maximum of 56 functions from group 'realtime' will be traced. INFO: A maximum of 199 functions from group 'syscall' will be traced.
But when I try to run it, I get the following:
[tpapathe@titan-batch6: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/tests/charm++/simplearrayhello]$ aprun -n1 ./hello+pat CrayPat/X: Version 6.4.5 Revision 87dd5b8 01/23/17 15:37:24 pat[WARNING][0]: Collection of accelerator performance data for sampling experiments is not supported. To collect accelerator performance data perform a trace experiment. See the intro_craypat(1) man page on how to perform a trace experiment. Charm++> Running on Gemini (GNI) with 1 processes Charm++> static SMSG Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit) Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB Charm++> Cray TLB page size: 8192K Charm++> Running in non-SMP mode: numPes 1 Converse/Charm++ Commit ID: v6.8.0-beta1-287-gd57c83d CharmLB> Load balancer assumes all CPUs are same. Charm++> Running on 1 unique compute nodes (16-way SMP). libhugetlbfs [nid03789:29517]: WARNING: New heap segment map at 0x106bc800000 failed: Cannot allocate memory libhugetlbfs [nid03789:29517]: WARNING: New heap segment map at 0x106bc800000 failed: Cannot allocate memory libhugetlbfs [nid03789:29517]: WARNING: New heap segment map at 0x106bc800000 failed: Cannot allocate memory … …
If I instead compile the cuda version of hello, I cannot find nvcc because it’s looking in the default path where the cuda installation is normally:
[tpapathe@titan-ext2: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/hello]$ make OPTS="-save" ../../../../bin/charmc -save hello.ci ../../../../bin/charmc -save -c hello.C /usr/local/cuda/bin/nvcc -c -use_fast_math -I/usr/local/cuda/include -I../../../../include helloCUDA.cu make: /usr/local/cuda/bin/nvcc: Command not found make: *** [helloCUDA.o] Error 127 [tpapathe@titan-ext2: /lustre/atlas2/csc198/proj-shared/tpapathe/charm/gni-crayxe-cuda-perftools/examples/charm++/cuda/hello]$
Is there a way to add “-L /opt/nvidia/cudatoolkit7.5/7.5.18-1.0502.10743.2.1/bin/”? I tried adding it to the OPTS=”-save” line but that didn’t work.
From: <unmobile AT gmail.com> on behalf of Phil Miller <mille121 AT illinois.edu>
You can also run
make OPTS="-save"
and not worry about editing makefiles, etc.
On Thu, Jul 6, 2017 at 4:56 PM, Ted Packwood <malice AT cray.com> wrote:
|
- Re: [charm] CrayPat with Charm++, (continued)
- Re: [charm] CrayPat with Charm++, Papatheodore, Thomas L., 07/06/2017
- Re: [charm] CrayPat with Charm++, Phil Miller, 07/06/2017
- Re: [charm] CrayPat with Charm++, Ted Packwood, 07/06/2017
- Re: [charm] CrayPat with Charm++, Papatheodore, Thomas L., 07/06/2017
- Re: [charm] CrayPat with Charm++, Phil Miller, 07/06/2017
- Re: [charm] CrayPat with Charm++, Papatheodore, Thomas L., 07/06/2017
- Re: [charm] CrayPat with Charm++, Ted Packwood, 07/06/2017
- Re: [charm] CrayPat with Charm++, Phil Miller, 07/06/2017
- Re: [charm] CrayPat with Charm++, Papatheodore, Thomas L., 07/06/2017
- RE: [charm] CrayPat with Charm++, Choi, Jaemin, 07/07/2017
- Re: [charm] CrayPat with Charm++, Papatheodore, Thomas L., 07/10/2017
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- RE: [charm] CrayPat with Charm++, Choi, Jaemin, 07/13/2017
- Re: [charm] CrayPat with Charm++, Papatheodore, Thomas L., 07/06/2017
Archive powered by MHonArc 2.6.19.