charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Sam White <white67 AT illinois.edu>
- To: "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com>
- Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: Re: [charm] Adaptive MPI
- Date: Wed, 23 Nov 2016 14:21:13 -0600
To debug the issue with your PUP code, I would suggest adding print statements before/after your AMPI_Migrate() call, and inside the PUP routine. It often helps to see where in the PUP process (sizing, packing, deleting, unpacking) the runtime is when it fails to debug these types of issues.
-Sam
Hi Sam,
The first experiment was successful, but the isomalloc example hangs. See below. Unless it is a symptom of something bigger, I am not going to worry about the latter, since I wasn’t planning to use isomalloc for heap migration anyway. My regular MPI code on which the AMPI version is based runs fine for all the parameters I have tried, but I reckon that it may contain a memory bug that manifests itself only with load balancing
Rob
rfvander@klondike:~/Cjacobi3D$ make
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c jacobi.C
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi jacobi.o -module CommonLBs -lm
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c -DNO_PUP jacobi.C -o jacobi.iso.o
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi.iso jacobi.iso.o -module CommonLBs -memory isomalloc
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c -tlsglobal jacobi.C -o jacobi.tls.o
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi.tls jacobi.tls.o -tlsglobal -module CommonLBs #-memory isomalloc
/opt/charm/charm-6.7.0/multicore-linux64/bin/../lib/libconv-util.a(sockRoutines.o): In function `skt_lookup_ip':
sockRoutines.c:(.text+0x334): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c jacobi-get.C
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi-get jacobi-get.o -module CommonLBs -lm
rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1
Running command: ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +p3
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 3 threads
Converse/Charm++ Commit ID: v6.7.0-1-gca55e1d
Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.
CharmLB> Verbose level 1, load balancing period: 0.5 seconds
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (16-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
[0] RotateLB created
iter 1 time: 0.142733 maxerr: 2020.200000
iter 2 time: 0.157225 maxerr: 1696.968000
iter 3 time: 0.172039 maxerr: 1477.170240
iter 4 time: 0.146178 maxerr: 1319.433024
iter 5 time: 0.123098 maxerr: 1200.918072
iter 6 time: 0.131063 maxerr: 1108.425519
iter 7 time: 0.138213 maxerr: 1033.970839
iter 8 time: 0.138295 maxerr: 972.509242
iter 9 time: 0.138113 maxerr: 920.721889
iter 10 time: 0.121553 maxerr: 876.344030
CharmLB> RotateLB: PE [0] step 0 starting at 1.489509 Memory: 72.253906 MB
CharmLB> RotateLB: PE [0] strategy starting at 1.489573
CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB
CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB
CharmLB> RotateLB: PE [0] strategy finished at 1.489592 duration 0.000019 s
CharmLB> RotateLB: PE [0] step 0 finished at 1.507922 duration 0.018413 s
iter 11 time: 0.152840 maxerr: 837.779089
iter 12 time: 0.136401 maxerr: 803.868831
iter 13 time: 0.138095 maxerr: 773.751705
iter 14 time: 0.139319 maxerr: 746.772667
iter 15 time: 0.139327 maxerr: 722.424056
iter 16 time: 0.141794 maxerr: 700.305763
iter 17 time: 0.142484 maxerr: 680.097726
iter 18 time: 0.141056 maxerr: 661.540528
iter 19 time: 0.153895 maxerr: 644.421422
iter 20 time: 0.198588 maxerr: 628.564089
[Partition 0][Node 0] End of program
rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1
Running command: ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +p3
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 3 threads
^C
rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +isomalloc_sync
Running command: ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +isomalloc_sync +p3
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 3 threads
From: samt.white AT gmail.com [mailto:samt.white AT gmail.com] On Behalf Of Sam White
Sent: Wednesday, November 23, 2016 7:10 AM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Cc: charm AT cs.uiuc.edu
Subject: Re: Adaptive MPI
Can you try an example AMPI program with load balancing? You can try charm/examples/ampi/Cjacobi3D/, running with something like '
./charmrun +p3 ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1'. You can also test that example with Isomalloc by running jacobi.iso (and as the warning in the Charm preamble output suggests, run with +isomalloc_sync). It also might help to build Charm++/AMPI with '-g' to get stacktraces.
-Sam
On Wed, Nov 23, 2016 at 2:19 AM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:
Hello Team,
I am trying to troubleshoot my Adaptive MPI code that uses dynamic load balancing. It crashes with a segmentation fault in AMPI_Migrate. I checked and dchunkpup (which I supplied) is called within AMPI_Migrate and finishes on all ranks. That is not to say it is correct, but the crash is not happening there. It could have corrupted memory elsewhere, though, so I gutted it, such that it only asks for and prints the MPI rank of the ranks entering it. I added graceful exit code after the call to AMPI_Migrate, But that is evidently not reached. I understand that this information is not enough for you to identify the problem, but at present I don’t know where to start, since the error occurs in code that I did not write. Could you give me some pointers where to start? Thanks!
Below is some relevant output. If I replace the RotateLB load balancer with RefineLB, some ranks do pass the AMPI_Migrate call, but that is evidently because the load balancer left them alone.
Rob
rfvander@klondike:~/esg-prk-devel/AMPI/AMR$ make clean; make amr USE_PUPER=1
rm -f amr.o MPI_bail_out.o wtime.o amr *.optrpt *~ charmrun stats.json amr.decl.h amr.def.h
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1 -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c amr.c
In file included from amr.c:66:0:
../../include/par-res-kern_general.h: In function ‘prk_malloc’:
../../include/par-res-kern_general.h:136:11: warning: implicit declaration of function ‘posix_memalign’ [-Wimplicit-function-declaration]
ret = posix_memalign(&ptr,alignment,bytes);
^
amr.c: In function ‘AMPI_Main’:
amr.c:842:14: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=]
printf("ERROR: rank %d's BG work tile smaller than stencil radius: %d\n",
^
amr.c:1080:14: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has type ‘long int’ [-Wformat=]
printf("ERROR: rank %d's work tile %d smaller than stencil radius: %d\n",
^
amr.c:1518:14: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=]
printf("Rank %d about to call AMPI_Migrate in iter %d\n", my_ID, iter);
^
amr.c:1520:14: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=]
printf("Rank %d called AMPI_Migrate in iter %d\n", my_ID, iter);
^
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1 -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c ../../common/MPI_bail_out.c
In file included from ../../common/MPI_bail_out.c:51:0:
../../include/par-res-kern_general.h: In function ‘prk_malloc’:
../../include/par-res-kern_general.h:136:11: warning: implicit declaration of function ‘posix_memalign’ [-Wimplicit-function-declaration]
ret = posix_memalign(&ptr,alignment,bytes);
^
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1 -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c ../../common/wtime.c
/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -language ampi -o amr -O3 -std=c99 -DADAPTIVE_MPI amr.o MPI_bail_out.o wtime.o -lm -module CommonLBs
cc1plus: warning: command line option ‘-std=c99’ is valid for C/ObjC but not for C++
rfvander@klondike:~/esg-prk-devel/AMPI/AMR$ /opt/charm/charm-6.7.0/bin/charmrun ./amr 20 1000 500 3 10 5 1 FINE_GRAIN +p 8 +vp 16 +balancer RotateLB +LBDebug 1
Running command: ./amr 20 1000 500 3 10 5 1 FINE_GRAIN +p 8 +vp 16 +balancer RotateLB +LBDebug 1
Charm++: standalone mode (not using charmrun)
Charm++> Running in Multicore mode: 8 threads
Converse/Charm++ Commit ID: v6.7.0-1-gca55e1d
Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.
CharmLB> Verbose level 1, load balancing period: 0.5 seconds
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (16-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
[0] RotateLB created
Parallel Research Kernels Version 2.17
MPI AMR stencil execution on 2D grid
Number of ranks = 16
Background grid size = 1000
Radius of stencil = 2
Tiles in x/y-direction on BG = 4/4
Tiles in x/y-direction on ref 0 = 4/4
Tiles in x/y-direction on ref 1 = 4/4
Tiles in x/y-direction on ref 2 = 4/4
Tiles in x/y-direction on ref 3 = 4/4
Type of stencil = star
Data type = double precision
Compact representation of stencil loop body
Number of iterations = 20
Load balancer = FINE_GRAIN
Refinement rank spread = 16
Refinements:
Background grid points = 500
Grid size = 3993
Refinement level = 3
Period = 10
Duration = 5
Sub-iterations = 1
Rank 12 about to call AMPI_Migrate in iter 0
Rank 12 entered dchunkpup
Rank 7 about to call AMPI_Migrate in iter 0
Rank 7 entered dchunkpup
Rank 8 about to call AMPI_Migrate in iter 0
Rank 8 entered dchunkpup
Rank 4 about to call AMPI_Migrate in iter 0
Rank 4 entered dchunkpup
Rank 15 about to call AMPI_Migrate in iter 0
Rank 15 entered dchunkpup
Rank 11 about to call AMPI_Migrate in iter 0
Rank 11 entered dchunkpup
Rank 3 about to call AMPI_Migrate in iter 0
Rank 1 about to call AMPI_Migrate in iter 0
Rank 1 entered dchunkpup
Rank 3 entered dchunkpup
Rank 13 about to call AMPI_Migrate in iter 0
Rank 13 entered dchunkpup
Rank 6 about to call AMPI_Migrate in iter 0
Rank 6 entered dchunkpup
Rank 0 about to call AMPI_Migrate in iter 0
Rank 0 entered dchunkpup
Rank 9 about to call AMPI_Migrate in iter 0
Rank 9 entered dchunkpup
Rank 5 about to call AMPI_Migrate in iter 0
Rank 5 entered dchunkpup
Rank 2 about to call AMPI_Migrate in iter 0
Rank 2 entered dchunkpup
Rank 10 about to call AMPI_Migrate in iter 0
Rank 10 entered dchunkpup
Rank 14 about to call AMPI_Migrate in iter 0
Rank 14 entered dchunkpup
CharmLB> RotateLB: PE [0] step 0 starting at 0.507547 Memory: 990.820312 MB
CharmLB> RotateLB: PE [0] strategy starting at 0.511685
CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 19 KB
CharmLB> RotateLB: PE [0] #Objects migrating: 16, LBMigrateMsg size: 0.00 MB
CharmLB> RotateLB: PE [0] strategy finished at 0.511696 duration 0.000011 s
Segmentation fault (core dumped)
- [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- <Possible follow-up(s)>
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- Message not available
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- Message not available
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/24/2016
- Re: [charm] Adaptive MPI, Phil Miller, 11/25/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/25/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
- Message not available
- Re: [charm] Adaptive MPI, Sam White, 11/28/2016
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
Archive powered by MHonArc 2.6.19.