charm AT lists.siebelschool.illinois.edu

Subject: Charm++ parallel programming system

List archive

RE: [charm] Adaptive MPI

From: "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com>
To: Sam White <white67 AT illinois.edu>
Cc: "Miller, Philip B" <mille121 AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
Subject: RE: [charm] Adaptive MPI
Date: Tue, 29 Nov 2016 17:12:39 +0000
Accept-language: en-US

I wanted to add that I observed the same performance anomaly with the code that my intern, Evangelos Georganas, wrote last year, which I compiled and ran with Charm++ version 6.7.0. My experiments are all on a single shared-memory node. Should I expect things to be more favorable for the explicit PUP method on a distributed-memory system? Thanks.

From: Van Der Wijngaart, Rob F
Sent: Tuesday, November 29, 2016 12:06 AM
To: 'Sam White' <white67 AT illinois.edu>
Cc: Miller, Philip B <mille121 AT illinois.edu>; charm AT cs.uiuc.edu
Subject: RE: [charm] Adaptive MPI

Hi Sam,

I switched to the 6.7.1 development branch and am now able to run to completion most of the time. I did notice the following peculiar behaviors.

With a smallish data set (a few tens of MB), the isomalloc version without PUP runs at about the same speed as the one with an explicit PUP routine (I am using LBRotate). With a larger data set (about 100 times the smallish set) the isomalloc version without PUP consistently runs about 3X faster than the one with the explicit PUP routine. In my pup routine if have a lot of individual pup calls (about 100).

Do you observe the same in your regression tests? And are there known conditions under which pure isomalloc is faster than explicit PUP routines?

The explicit PUP version runs out of memory sooner than the pure isomalloc version. When I look at the output from LBDebug, I see that the load balancer uses the same amount of memory, regardless of explicit PUP or pure isomalloc.

With pure isomalloc and the larger problem I see very consistent duration of the load balancing step (LBDebug says about 0.0025s per step), whereas for the PUP version it is consistently about 1.2s.

Thanks.

Rob

From: samt.white AT gmail.com [mailto:samt.white AT gmail.com] On Behalf Of Sam White
Sent: Monday, November 28, 2016 4:02 PM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Cc: Miller, Philip B <mille121 AT illinois.edu>; charm AT cs.uiuc.edu
Subject: Re: [charm] Adaptive MPI

That is a known issue in the 6.7.1 release. I mentioned this issue in the previous 'When to migrate' thread to try to save you this effort. As things stand, you can either use the 6.7.0 release with the old interface, or use mainline/development charm until 6.8.0 is released in the coming weeks. The development version is tested nightly on various platforms, and contains many bug fixes and performance improvements over 6.7.0.

-Sam

On Mon, Nov 28, 2016 at 5:58 PM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:

For now I am overriding the load balancer test in the code that reads its key value and am just executing TCHARM_Migrate() whenever the key is found, regardless of its value. Keep fingers crossed.

From: Van Der Wijngaart, Rob F
Sent: Monday, November 28, 2016 3:36 PM
To: 'Phil Miller' <mille121 AT illinois.edu>
Cc: Sam White <white67 AT illinois.edu>; charm AT cs.uiuc.edu
Subject: RE: [charm] Adaptive MPI

Hi Phil,

So far I had been using charm6.7.0, but I started to notice errors that appeared to be caused by the migration routines in AMPI, so I tried out the new version, 6.7.1. The way the load balancing hints are read appears corrupted. Please see below for a run with an example from examples/ampi/Cjacobi3D. The first time the value of the load balancer key is read it is correct, but all subsequent times when it is actually used, the library attaches a random character. I inserted the debug line:

key 0 equals ampi_load_balance with value sync

Rob

rfvander@klondike:~/charm-6.7.1/examples/ampi/Cjacobi3D$ $HOME/charm-6.7.1/bin/charmrun ./jacobi 2 2 2 30 +p 2 +vp 8 +isomalloc_sync +balancer RotateLB +LBDebug 1

Running command: ./jacobi 2 2 2 30 +p 2 +vp 8 +isomalloc_sync +balancer RotateLB +LBDebug 1

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 2 threads

Converse/Charm++ Commit ID:

Warning> Randomization of stack pointer is turned on in kernel.

Charm++> synchronizing isomalloc memory region...

[0] consolidated Isomalloc memory region: 0x440000000 - 0x7f5d00000000 (133532672 megs)

CharmLB> Verbose level 1, load balancing period: 0.5 seconds

CharmLB> Load balancer assumes all CPUs are same.

Charm++> Running on 1 unique compute nodes (16-way SMP).

Charm++> cpu topology info is gathered in 0.000 seconds.

[0] RotateLB created

iter 1 time: 0.078998 maxerr: 2020.200000

iter 2 time: 0.059326 maxerr: 1696.968000

iter 3 time: 0.050306 maxerr: 1477.170240

iter 4 time: 0.045964 maxerr: 1319.433024

iter 5 time: 0.045959 maxerr: 1200.918072

iter 6 time: 0.045985 maxerr: 1108.425519

iter 7 time: 0.045932 maxerr: 1033.970839

iter 8 time: 0.045992 maxerr: 972.509242

iter 9 time: 0.045941 maxerr: 920.721889

iter 10 time: 0.045945 maxerr: 876.344030

key 0 equals ampi_load_balance with value sync

key 0 equals ampi_load_balance with value sync

key 0 equals ampi_load_balance with value sync

key 0 equals ampi_load_balance with value sync

key 0 equals ampi_load_balance with value sync

key 0 equals ampi_load_balance with value sync

key 0 equals ampi_load_balance with value sync

key 0 equals ampi_load_balance with value sync

CharmLB> RotateLB: PE [0] step 0 starting at 0.853504 Memory: 72.253906 MB

CharmLB> RotateLB: PE [0] strategy starting at 0.853559

CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB

CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB

CharmLB> RotateLB: PE [0] strategy finished at 0.853564 duration 0.000005 s

CharmLB> RotateLB: PE [0] step 0 finished at 0.882196 duration 0.028692 s

iter 11 time: 0.063316 maxerr: 837.779089

iter 12 time: 0.046134 maxerr: 803.868831

iter 13 time: 0.046079 maxerr: 773.751705

iter 14 time: 0.046063 maxerr: 746.772667

iter 15 time: 0.046088 maxerr: 722.424056

iter 16 time: 0.046083 maxerr: 700.305763

iter 17 time: 0.046087 maxerr: 680.097726

iter 18 time: 0.046047 maxerr: 661.540528

iter 19 time: 0.044149 maxerr: 644.421422

iter 20 time: 0.040968 maxerr: 628.564089

iter 21 time: 0.040264 maxerr: 613.821009

iter 22 time: 0.040429 maxerr: 600.067696

iter 23 time: 0.040471 maxerr: 587.198273

iter 24 time: 0.040278 maxerr: 575.122054

iter 25 time: 0.040325 maxerr: 563.760848

iter 26 time: 0.040425 maxerr: 553.046836

iter 27 time: 0.040186 maxerr: 542.920870

iter 28 time: 0.040066 maxerr: 533.331094

iter 29 time: 0.040020 maxerr: 524.231833

key 0 equals ampi_load_balance with value synca

WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance

key 0 equals ampi_load_balance with value synca

WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance

key 0 equals ampi_load_balance with value synca

WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance

key 0 equals ampi_load_balance with value synca

WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance

key 0 equals ampi_load_balance with value synca

WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance

key 0 equals ampi_load_balance with value synca

WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance

key 0 equals ampi_load_balance with value synca

WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance

iter 30 time: 0.040080 maxerr: 515.582675

key 0 equals ampi_load_balance with value synca

WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance

[Partition 0][Node 0] End of program

From: unmobile AT gmail.com [mailto:unmobile AT gmail.com] On Behalf Of Phil Miller
Sent: Friday, November 25, 2016 2:09 PM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Cc: Sam White <white67 AT illinois.edu>; charm AT cs.uiuc.edu
Subject: Re: [charm] Adaptive MPI

Sam: It seems like it should be straightforward to add an assertion in our API entry/exit tracking sentries to catch this kind of issue. Essentially, it would need to check that the calling thread is actually an AMPI process thread that's supposed to be running. We should also document that PUP routines for AMPI code can't call MPI routines.

On Thu, Nov 24, 2016 at 5:36 PM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:

Hi Sam,

I put the code away for a bit and just started looking at it again. I identified one major (and vexing source) of errors: I tried to get ranks to print what they were doing (using MP{I_Comm_rank) inside the PUP routine, and also to synchronize (MPI_Barrier) to order the output. But that it evidently not valid inside the routine, depending on the mode with which it is called. The first two entries are fine, but once migration takes place, errors result. I took all MPI calls out of the PUP routine, and now the code progresses a lot further. Still bombs, but I am pretty sure I can track down the segmentation violation.

Happy Thanksgiving!

Rob

From: samt.white AT gmail.com [mailto:samt.white AT gmail.com] On Behalf Of Sam White
Sent: Wednesday, November 23, 2016 1:30 PM

To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Cc: charm AT cs.uiuc.edu
Subject: Re: Adaptive MPI

Your code is failing inside the call to pup_isPacking(p)? Or it is failing while packing? A pup_er is indeed a pointer.
Also, you should still be using '+isomalloc_sync' whenever Charm gives you that warning during startup: even though you aren't using Isomalloc Heaps, AMPI is using Isomalloc Stacks for its user-level threads.

-Sam

On Wed, Nov 23, 2016 at 3:06 PM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:

Thanks, Sam. The code crashes inside AMPI_Migrate, so it doesn’t reach any print statements after that. I tracked down the statement that causes the crash. It is this one: pup_isPacking(p), where p is of type pup_er, I presume that is a pointer, so I printed it as such. They all look like reasonable addresses to me. None of the ranks prints NULL.

Rob

From: samt.white AT gmail.com [mailto:samt.white AT gmail.com] On Behalf Of Sam White
Sent: Wednesday, November 23, 2016 12:21 PM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Cc: charm AT cs.uiuc.edu
Subject: Re: Adaptive MPI

The Isomalloc failure appears to be a locking issue during Charm/Converse startup in SMP/multicore builds when running with Isomalloc. We are looking at this now: https://charm.cs.illinois.edu/redmine/issues/1310. If you switch to a non-SMP/multicore build it will work.

To debug the issue with your PUP code, I would suggest adding print statements before/after your AMPI_Migrate() call, and inside the PUP routine. It often helps to see where in the PUP process (sizing, packing, deleting, unpacking) the runtime is when it fails to debug these types of issues.

-Sam

On Wed, Nov 23, 2016 at 11:28 AM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:

Hi Sam,

The first experiment was successful, but the isomalloc example hangs. See below. Unless it is a symptom of something bigger, I am not going to worry about the latter, since I wasn’t planning to use isomalloc for heap migration anyway. My regular MPI code on which the AMPI version is based runs fine for all the parameters I have tried, but I reckon that it may contain a memory bug that manifests itself only with load balancing

Rob

rfvander@klondike:~/Cjacobi3D$ make

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c jacobi.C

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi jacobi.o -module CommonLBs -lm

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c -DNO_PUP jacobi.C -o jacobi.iso.o

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi.iso jacobi.iso.o -module CommonLBs -memory isomalloc

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c -tlsglobal jacobi.C -o jacobi.tls.o

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi.tls jacobi.tls.o -tlsglobal -module CommonLBs #-memory isomalloc

/opt/charm/charm-6.7.0/multicore-linux64/bin/../lib/libconv-util.a(sockRoutines.o): In function `skt_lookup_ip':

sockRoutines.c:(.text+0x334): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -c jacobi-get.C

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicxx -o jacobi-get jacobi-get.o -module CommonLBs -lm

rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1

Running command: ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +p3

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 3 threads

Converse/Charm++ Commit ID: v6.7.0-1-gca55e1d

Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.

CharmLB> Verbose level 1, load balancing period: 0.5 seconds

CharmLB> Load balancer assumes all CPUs are same.

Charm++> Running on 1 unique compute nodes (16-way SMP).

Charm++> cpu topology info is gathered in 0.000 seconds.

[0] RotateLB created

iter 1 time: 0.142733 maxerr: 2020.200000

iter 2 time: 0.157225 maxerr: 1696.968000

iter 3 time: 0.172039 maxerr: 1477.170240

iter 4 time: 0.146178 maxerr: 1319.433024

iter 5 time: 0.123098 maxerr: 1200.918072

iter 6 time: 0.131063 maxerr: 1108.425519

iter 7 time: 0.138213 maxerr: 1033.970839

iter 8 time: 0.138295 maxerr: 972.509242

iter 9 time: 0.138113 maxerr: 920.721889

iter 10 time: 0.121553 maxerr: 876.344030

CharmLB> RotateLB: PE [0] step 0 starting at 1.489509 Memory: 72.253906 MB

CharmLB> RotateLB: PE [0] strategy starting at 1.489573

CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB

CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB

CharmLB> RotateLB: PE [0] strategy finished at 1.489592 duration 0.000019 s

CharmLB> RotateLB: PE [0] step 0 finished at 1.507922 duration 0.018413 s

iter 11 time: 0.152840 maxerr: 837.779089

iter 12 time: 0.136401 maxerr: 803.868831

iter 13 time: 0.138095 maxerr: 773.751705

iter 14 time: 0.139319 maxerr: 746.772667

iter 15 time: 0.139327 maxerr: 722.424056

iter 16 time: 0.141794 maxerr: 700.305763

iter 17 time: 0.142484 maxerr: 680.097726

iter 18 time: 0.141056 maxerr: 661.540528

iter 19 time: 0.153895 maxerr: 644.421422

iter 20 time: 0.198588 maxerr: 628.564089

[Partition 0][Node 0] End of program

rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1

Running command: ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +p3

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 3 threads

^C

rfvander@klondike:~/Cjacobi3D$ ./charmrun +p3 ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +isomalloc_sync

Running command: ./jacobi.iso 2 2 2 +vp8 +balancer RotateLB +LBDebug 1 +isomalloc_sync +p3

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 3 threads

From: samt.white AT gmail.com [mailto:samt.white AT gmail.com] On Behalf Of Sam White
Sent: Wednesday, November 23, 2016 7:10 AM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>
Cc: charm AT cs.uiuc.edu
Subject: Re: Adaptive MPI

Can you try an example AMPI program with load balancing? You can try charm/examples/ampi/Cjacobi3D/, running with something like '

./charmrun +p3 ./jacobi 2 2 2 +vp8 +balancer RotateLB +LBDebug 1'. You can also test that example with Isomalloc by running jacobi.iso (and as the warning in the Charm preamble output suggests, run with +isomalloc_sync). It also might help to build Charm++/AMPI with '-g' to get stacktraces.

-Sam

On Wed, Nov 23, 2016 at 2:19 AM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:

Hello Team,

I am trying to troubleshoot my Adaptive MPI code that uses dynamic load balancing. It crashes with a segmentation fault in AMPI_Migrate. I checked and dchunkpup (which I supplied) is called within AMPI_Migrate and finishes on all ranks. That is not to say it is correct, but the crash is not happening there. It could have corrupted memory elsewhere, though, so I gutted it, such that it only asks for and prints the MPI rank of the ranks entering it. I added graceful exit code after the call to AMPI_Migrate, But that is evidently not reached. I understand that this information is not enough for you to identify the problem, but at present I don’t know where to start, since the error occurs in code that I did not write. Could you give me some pointers where to start? Thanks!

Below is some relevant output. If I replace the RotateLB load balancer with RefineLB, some ranks do pass the AMPI_Migrate call, but that is evidently because the load balancer left them alone.

Rob

rfvander@klondike:~/esg-prk-devel/AMPI/AMR$ make clean; make amr USE_PUPER=1

rm -f amr.o MPI_bail_out.o wtime.o amr *.optrpt *~ charmrun stats.json amr.decl.h amr.def.h

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1   -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c amr.c

In file included from amr.c:66:0:

../../include/par-res-kern_general.h: In function â€˜prk_mallocâ€™:

../../include/par-res-kern_general.h:136:11: warning: implicit declaration of function â€˜posix_memalignâ€™ [-Wimplicit-function-declaration]

     ret = posix_memalign(&ptr,alignment,bytes);

           ^

amr.c: In function â€˜AMPI_Mainâ€™:

amr.c:842:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 3 has type â€˜long intâ€™ [-Wformat=]

       printf("ERROR: rank %d's BG work tile smaller than stencil radius: %d\n",

              ^

amr.c:1080:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 4 has type â€˜long intâ€™ [-Wformat=]

       printf("ERROR: rank %d's work tile %d smaller than stencil radius: %d\n",

              ^

amr.c:1518:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 3 has type â€˜long intâ€™ [-Wformat=]

       printf("Rank %d about to call AMPI_Migrate in iter %d\n", my_ID, iter);

              ^

amr.c:1520:14: warning: format â€˜%dâ€™ expects argument of type â€˜intâ€™, but argument 3 has type â€˜long intâ€™ [-Wformat=]

       printf("Rank %d called AMPI_Migrate in iter %d\n", my_ID, iter);

              ^

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1   -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c ../../common/MPI_bail_out.c

In file included from ../../common/MPI_bail_out.c:51:0:

../../include/par-res-kern_general.h: In function â€˜prk_mallocâ€™:

../../include/par-res-kern_general.h:136:11: warning: implicit declaration of function â€˜posix_memalignâ€™ [-Wimplicit-function-declaration]

     ret = posix_memalign(&ptr,alignment,bytes);

           ^

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1   -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c ../../common/wtime.c

/opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -language ampi -o amr   -O3 -std=c99 -DADAPTIVE_MPI amr.o MPI_bail_out.o wtime.o -lm -module CommonLBs

cc1plus: warning: command line option â€˜-std=c99â€™ is valid for C/ObjC but not for C++

rfvander@klondike:~/esg-prk-devel/AMPI/AMR$ /opt/charm/charm-6.7.0/bin/charmrun ./amr 20 1000 500 3 10 5 1 FINE_GRAIN +p 8 +vp 16 +balancer RotateLB +LBDebug 1

Running command: ./amr 20 1000 500 3 10 5 1 FINE_GRAIN +p 8 +vp 16 +balancer RotateLB +LBDebug 1

Charm++: standalone mode (not using charmrun)

Charm++> Running in Multicore mode: 8 threads

Converse/Charm++ Commit ID: v6.7.0-1-gca55e1d

Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.

CharmLB> Verbose level 1, load balancing period: 0.5 seconds

CharmLB> Load balancer assumes all CPUs are same.

Charm++> Running on 1 unique compute nodes (16-way SMP).

Charm++> cpu topology info is gathered in 0.001 seconds.

[0] RotateLB created

Parallel Research Kernels Version 2.17

MPI AMR stencil execution on 2D grid

Number of ranks                 = 16

Background grid size            = 1000

Radius of stencil               = 2

Tiles in x/y-direction on BG    = 4/4

Tiles in x/y-direction on ref 0 = 4/4

Tiles in x/y-direction on ref 1 = 4/4

Tiles in x/y-direction on ref 2 = 4/4

Tiles in x/y-direction on ref 3 = 4/4

Type of stencil                 = star

Data type                       = double precision

Compact representation of stencil loop body

Number of iterations            = 20

Load balancer                   = FINE_GRAIN

Refinement rank spread          = 16

Refinements:

   Background grid points       = 500

   Grid size                    = 3993

   Refinement level             = 3

   Period                       = 10

   Duration                     = 5

   Sub-iterations               = 1

Rank 12 about to call AMPI_Migrate in iter 0

Rank 12 entered dchunkpup

Rank 7 about to call AMPI_Migrate in iter 0

Rank 7 entered dchunkpup

Rank 8 about to call AMPI_Migrate in iter 0

Rank 8 entered dchunkpup

Rank 4 about to call AMPI_Migrate in iter 0

Rank 4 entered dchunkpup

Rank 15 about to call AMPI_Migrate in iter 0

Rank 15 entered dchunkpup

Rank 11 about to call AMPI_Migrate in iter 0

Rank 11 entered dchunkpup

Rank 3 about to call AMPI_Migrate in iter 0

Rank 1 about to call AMPI_Migrate in iter 0

Rank 1 entered dchunkpup

Rank 3 entered dchunkpup

Rank 13 about to call AMPI_Migrate in iter 0

Rank 13 entered dchunkpup

Rank 6 about to call AMPI_Migrate in iter 0

Rank 6 entered dchunkpup

Rank 0 about to call AMPI_Migrate in iter 0

Rank 0 entered dchunkpup

Rank 9 about to call AMPI_Migrate in iter 0

Rank 9 entered dchunkpup

Rank 5 about to call AMPI_Migrate in iter 0

Rank 5 entered dchunkpup

Rank 2 about to call AMPI_Migrate in iter 0

Rank 2 entered dchunkpup

Rank 10 about to call AMPI_Migrate in iter 0

Rank 10 entered dchunkpup

Rank 14 about to call AMPI_Migrate in iter 0

Rank 14 entered dchunkpup

CharmLB> RotateLB: PE [0] step 0 starting at 0.507547 Memory: 990.820312 MB

CharmLB> RotateLB: PE [0] strategy starting at 0.511685

CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 19 KB

CharmLB> RotateLB: PE [0] #Objects migrating: 16, LBMigrateMsg size: 0.00 MB

CharmLB> RotateLB: PE [0] strategy finished at 0.511696 duration 0.000011 s

Segmentation fault (core dumped)

RE: [charm] Adaptive MPI, (continued)