charm AT lists.siebelschool.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Adaptive MPI

From: Sam White <white67 AT illinois.edu>
To: "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com>
Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
Subject: Re: [charm] Adaptive MPI
Date: Wed, 23 Nov 2016 14:21:13 -0600

The Isomalloc failure appears to be a locking issue during Charm/Converse startup in SMP/multicore builds when running with Isomalloc. We are looking at this now: https://charm.cs.illinois.edu/redmine/issues/1310. If you switch to a non-SMP/multicore build it will work.

To debug the issue with your PUP code, I would suggest adding print statements before/after your AMPI_Migrate() call, and inside the PUP routine. It often helps to see where in the PUP process (sizing, packing, deleting, unpacking) the runtime is when it fails to debug these types of issues.

-Sam

On Wed, Nov 23, 2016 at 11:28 AM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:

Hi Sam,

The first experiment was successful, but the isomalloc example hangs. See below. Unless it is a symptom of something bigger, I am not going to worry about the latter, since I wasn’t planning to use isomalloc for heap migration anyway. My regular MPI code on which the AMPI version is based runs fine for all the parameters I have tried, but I reckon that it may contain a memory bug that manifests itself only with load balancing

Rob

rfvander@klondike:~/Cjacobi3D$ make

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c jacobi.C

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi jacobi.o -module CommonLBs -lm

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c -DNO_PUP jacobi.C -o jacobi.iso.o

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi.iso jacobi.iso.o -module CommonLBs -memory isomalloc

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c -tlsglobal jacobi.C -o jacobi.tls.o

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi.tls jacobi.tls.o -tlsglobal -module CommonLBs #-memory isomalloc

/opt/charm/charm-6.7.0/multicore-linux64/bin/../lib/libconv-util.a(sockRoutines.o): In function `skt_lookup_ip':

sockRoutines.c:(.text+0x334): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c jacobi-get.C

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi-get jacobi-get.o -module CommonLBs -lm

rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1

Running command: ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +p3

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 3 threads

Converse/Charm++ Commit ID: v6.7.0-1-gca55e1d

Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.

CharmLB> Verbose level 1, load balancing period: 0.5 seconds

CharmLB> Load balancer assumes all CPUs are same.

Charm++> Running on 1 unique compute nodes (16-way SMP).

Charm++> cpu topology info is gathered in 0.000 seconds.

[0] RotateLB created

iter 1 time: 0.142733 maxerr: 2020.200000

iter 2 time: 0.157225 maxerr: 1696.968000

iter 3 time: 0.172039 maxerr: 1477.170240

iter 4 time: 0.146178 maxerr: 1319.433024

iter 5 time: 0.123098 maxerr: 1200.918072

iter 6 time: 0.131063 maxerr: 1108.425519

iter 7 time: 0.138213 maxerr: 1033.970839

iter 8 time: 0.138295 maxerr: 972.509242

iter 9 time: 0.138113 maxerr: 920.721889

iter 10 time: 0.121553 maxerr: 876.344030

CharmLB> RotateLB: PE [0] step 0 starting at 1.489509 Memory: 72.253906 MB

CharmLB> RotateLB: PE [0] strategy starting at 1.489573

CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB

CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB

CharmLB> RotateLB: PE [0] strategy finished at 1.489592 duration 0.000019 s

CharmLB> RotateLB: PE [0] step 0 finished at 1.507922 duration 0.018413 s

iter 11 time: 0.152840 maxerr: 837.779089

iter 12 time: 0.136401 maxerr: 803.868831

iter 13 time: 0.138095 maxerr: 773.751705

iter 14 time: 0.139319 maxerr: 746.772667

iter 15 time: 0.139327 maxerr: 722.424056

iter 16 time: 0.141794 maxerr: 700.305763

iter 17 time: 0.142484 maxerr: 680.097726

iter 18 time: 0.141056 maxerr: 661.540528

iter 19 time: 0.153895 maxerr: 644.421422

iter 20 time: 0.198588 maxerr: 628.564089

[Partition 0][Node 0] End of program

rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1

Running command: ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +p3

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 3 threads

^C

rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +isomalloc_sync

Running command: ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +isomalloc_sync +p3

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 3 threads

From: samt.white AT gmail.com [mailto:samt.white AT gmail.com] On Behalf Of Sam White
Sent: Wednesday, November 23, 2016 7:10 AM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Cc: charm AT cs.uiuc.edu
Subject: Re: Adaptive MPI

Can you try an example AMPI program with load balancing? You can try charm/examples/ampi/Cjacobi3D/, running with something like '

./charmrun +p3 ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1'. You can also test that example with Isomalloc by running jacobi.iso (and as the warning in the Charm preamble output suggests, run with +isomalloc_sync). It also might help to build Charm++/AMPI with '-g' to get stacktraces.

-Sam

On Wed, Nov 23, 2016 at 2:19 AM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:

Hello Team,

I am trying to troubleshoot my Adaptive MPI code that uses dynamic load balancing. It crashes with a segmentation fault in AMPI_Migrate. I checked and dchunkpup (which I supplied) is called within AMPI_Migrate and finishes on all ranks. That is not to say it is correct, but the crash is not happening there. It could have corrupted memory elsewhere, though, so I gutted it, such that it only asks for and prints the MPI rank of the ranks entering it. I added graceful exit code after the call to AMPI_Migrate, But that is evidently not reached. I understand that this information is not enough for you to identify the problem, but at present I don’t know where to start, since the error occurs in code that I did not write. Could you give me some pointers where to start? Thanks!

Below is some relevant output. If I replace the RotateLB load balancer with RefineLB, some ranks do pass the AMPI_Migrate call, but that is evidently because the load balancer left them alone.

Rob

rfvander@klondike:~/esg-prk-devel/AMPI/AMR$ make clean; make amr USE_PUPER=1

rm -f amr.o MPI_bail_out.o wtime.o amr *.optrpt *~ charmrun stats.json amr.decl.h amr.def.h

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1   -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c amr.c

In file included from amr.c:66:0:

../../include/par-res-kern_general.h: In function â€˜prk_mallocâ€™:

../../include/par-res-kern_general.h:136:11: warning: implicit declaration of function â€˜posix_memalignâ€™ [-Wimplicit-function-declaration]

     ret = posix_memalign(&ptr,alignment,bytes);

           ^

amr.c: In function â€˜AMPI_Mainâ€™:

amr.c:842:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 3 has type â€˜long intâ€™ [-Wformat=]

       printf("ERROR: rank %d's BG work tile smaller than stencil radius: %d\n",

              ^

amr.c:1080:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 4 has type â€˜long intâ€™ [-Wformat=]

       printf("ERROR: rank %d's work tile %d smaller than stencil radius: %d\n",

              ^

amr.c:1518:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 3 has type â€˜long intâ€™ [-Wformat=]

       printf("Rank %d about to call AMPI_Migrate in iter %d\n", my_ID, iter);

              ^

amr.c:1520:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 3 has type â€˜long intâ€™ [-Wformat=]

       printf("Rank %d called AMPI_Migrate in iter %d\n", my_ID, iter);

              ^

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1   -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c ../../common/MPI_bail_out.c

In file included from ../../common/MPI_bail_out.c:51:0:

../../include/par-res-kern_general.h: In function â€˜prk_mallocâ€™:

../../include/par-res-kern_general.h:136:11: warning: implicit declaration of function â€˜posix_memalignâ€™ [-Wimplicit-function-declaration]

     ret = posix_memalign(&ptr,alignment,bytes);

           ^

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1   -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c ../../common/wtime.c

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -language ampi -o amr   -O3 -std=c99 -DADAPTIVE_MPI amr.o MPI_bail_out.o wtime.o -lm -module CommonLBs

cc1plus: warning: command line option â€˜-std=c99â€™ is valid for C/ObjC but not for C++

rfvander@klondike:~/esg-prk-devel/AMPI/AMR$ /opt/charm/charm-6.7.0/bin/charmrun ./amr 20 1000 500 3 10 5 1 FINE_GRAIN +p 8 +vp 16 +balancer RotateLB +LBDebug 1

Running command: ./amr 20 1000 500 3 10 5 1 FINE_GRAIN +p 8 +vp 16 +balancer RotateLB +LBDebug 1

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 8 threads

Converse/Charm++ Commit ID: v6.7.0-1-gca55e1d

Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.

CharmLB> Verbose level 1, load balancing period: 0.5 seconds

CharmLB> Load balancer assumes all CPUs are same.

Charm++> Running on 1 unique compute nodes (16-way SMP).

Charm++> cpu topology info is gathered in 0.001 seconds.

[0] RotateLB created

Parallel Research Kernels Version 2.17

MPI AMR stencil execution on 2D grid

Number of ranks                 = 16

Background grid size            = 1000

Radius of stencil               = 2

Tiles in x/y-direction on BG    = 4/4

Tiles in x/y-direction on ref 0 = 4/4

Tiles in x/y-direction on ref 1 = 4/4

Tiles in x/y-direction on ref 2 = 4/4

Tiles in x/y-direction on ref 3 = 4/4

Type of stencil                 = star

Data type                       = double precision

Compact representation of stencil loop body

Number of iterations            = 20

Load balancer                   = FINE_GRAIN

Refinement rank spread          = 16

Refinements:

   Background grid points       = 500

   Grid size                    = 3993

   Refinement level             = 3

   Period                       = 10

   Duration                     = 5

   Sub-iterations               = 1

Rank 12 about to call AMPI_Migrate in iter 0

Rank 12 entered dchunkpup

Rank 7 about to call AMPI_Migrate in iter 0

Rank 7 entered dchunkpup

Rank 8 about to call AMPI_Migrate in iter 0

Rank 8 entered dchunkpup

Rank 4 about to call AMPI_Migrate in iter 0

Rank 4 entered dchunkpup

Rank 15 about to call AMPI_Migrate in iter 0

Rank 15 entered dchunkpup

Rank 11 about to call AMPI_Migrate in iter 0

Rank 11 entered dchunkpup

Rank 3 about to call AMPI_Migrate in iter 0

Rank 1 about to call AMPI_Migrate in iter 0

Rank 1 entered dchunkpup

Rank 3 entered dchunkpup

Rank 13 about to call AMPI_Migrate in iter 0

Rank 13 entered dchunkpup

Rank 6 about to call AMPI_Migrate in iter 0

Rank 6 entered dchunkpup

Rank 0 about to call AMPI_Migrate in iter 0

Rank 0 entered dchunkpup

Rank 9 about to call AMPI_Migrate in iter 0

Rank 9 entered dchunkpup

Rank 5 about to call AMPI_Migrate in iter 0

Rank 5 entered dchunkpup

Rank 2 about to call AMPI_Migrate in iter 0

Rank 2 entered dchunkpup

Rank 10 about to call AMPI_Migrate in iter 0

Rank 10 entered dchunkpup

Rank 14 about to call AMPI_Migrate in iter 0

Rank 14 entered dchunkpup

CharmLB> RotateLB: PE [0] step 0 starting at 0.507547 Memory: 990.820312 MB

CharmLB> RotateLB: PE [0] strategy starting at 0.511685

CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 19 KB

CharmLB> RotateLB: PE [0] #Objects migrating: 16, LBMigrateMsg size: 0.00 MB

CharmLB> RotateLB: PE [0] strategy finished at 0.511696 duration 0.000011 s

Segmentation fault (core dumped)

[charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- <Possible follow-up(s)>
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
  - RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
  - RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
  - Message not available
    - Re: [charm] Adaptive MPI, Sam White, 11/23/2016
      - RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
      - Message not available
        
        Re: [charm] Adaptive MPI, Sam White, 11/23/2016
        
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
        
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
        
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/24/2016
        
        Re: [charm] Adaptive MPI, Phil Miller, 11/25/2016
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/25/2016
        
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
        
        RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
        
        Message not available
        Re: [charm] Adaptive MPI, Sam White, 11/28/2016