charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com>
- To: Phil Miller <mille121 AT illinois.edu>
- Cc: Sam White <white67 AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: RE: [charm] Adaptive MPI
- Date: Mon, 28 Nov 2016 23:58:54 +0000
- Accept-language: en-US
For now I am overriding the load balancer test in the code that reads its key value and am just executing TCHARM_Migrate() whenever the key is found, regardless of its value. Keep fingers crossed. From: Van Der Wijngaart, Rob F
Hi Phil,
So far I had been using charm6.7.0, but I started to notice errors that appeared to be caused by the migration routines in AMPI, so I tried out the new version, 6.7.1. The way the load balancing hints are read appears corrupted. Please see below for a run with an example from examples/ampi/Cjacobi3D. The first time the value of the load balancer key is read it is correct, but all subsequent times when it is actually used, the library attaches a random character. I inserted the debug line: key 0 equals ampi_load_balance with value sync
Rob
rfvander@klondike:~/charm-6.7.1/examples/ampi/Cjacobi3D$ $HOME/charm-6.7.1/bin/charmrun ./jacobi 2 2 2 30 +p 2 +vp 8 +isomalloc_sync +balancer RotateLB +LBDebug 1 Running command: ./jacobi 2 2 2 30 +p 2 +vp 8 +isomalloc_sync +balancer RotateLB +LBDebug 1
Charm++: standalone mode (not using charmrun) Charm++> Running in Multicore mode: 2 threads Converse/Charm++ Commit ID: Warning> Randomization of stack pointer is turned on in kernel. Charm++> synchronizing isomalloc memory region... [0] consolidated Isomalloc memory region: 0x440000000 - 0x7f5d00000000 (133532672 megs) CharmLB> Verbose level 1, load balancing period: 0.5 seconds CharmLB> Load balancer assumes all CPUs are same. Charm++> Running on 1 unique compute nodes (16-way SMP). Charm++> cpu topology info is gathered in 0.000 seconds. [0] RotateLB created iter 1 time: 0.078998 maxerr: 2020.200000 iter 2 time: 0.059326 maxerr: 1696.968000 iter 3 time: 0.050306 maxerr: 1477.170240 iter 4 time: 0.045964 maxerr: 1319.433024 iter 5 time: 0.045959 maxerr: 1200.918072 iter 6 time: 0.045985 maxerr: 1108.425519 iter 7 time: 0.045932 maxerr: 1033.970839 iter 8 time: 0.045992 maxerr: 972.509242 iter 9 time: 0.045941 maxerr: 920.721889 iter 10 time: 0.045945 maxerr: 876.344030 key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync
CharmLB> RotateLB: PE [0] step 0 starting at 0.853504 Memory: 72.253906 MB CharmLB> RotateLB: PE [0] strategy starting at 0.853559 CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB CharmLB> RotateLB: PE [0] strategy finished at 0.853564 duration 0.000005 s CharmLB> RotateLB: PE [0] step 0 finished at 0.882196 duration 0.028692 s
iter 11 time: 0.063316 maxerr: 837.779089 iter 12 time: 0.046134 maxerr: 803.868831 iter 13 time: 0.046079 maxerr: 773.751705 iter 14 time: 0.046063 maxerr: 746.772667 iter 15 time: 0.046088 maxerr: 722.424056 iter 16 time: 0.046083 maxerr: 700.305763 iter 17 time: 0.046087 maxerr: 680.097726 iter 18 time: 0.046047 maxerr: 661.540528 iter 19 time: 0.044149 maxerr: 644.421422 iter 20 time: 0.040968 maxerr: 628.564089 iter 21 time: 0.040264 maxerr: 613.821009 iter 22 time: 0.040429 maxerr: 600.067696 iter 23 time: 0.040471 maxerr: 587.198273 iter 24 time: 0.040278 maxerr: 575.122054 iter 25 time: 0.040325 maxerr: 563.760848 iter 26 time: 0.040425 maxerr: 553.046836 iter 27 time: 0.040186 maxerr: 542.920870 iter 28 time: 0.040066 maxerr: 533.331094 iter 29 time: 0.040020 maxerr: 524.231833 key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance iter 30 time: 0.040080 maxerr: 515.582675 key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance [Partition 0][Node 0] End of program
From:
unmobile AT gmail.com [mailto:unmobile AT gmail.com]
On Behalf Of Phil Miller
Sam: It seems like it should be straightforward to add an assertion in our API entry/exit tracking sentries to catch this kind of issue. Essentially, it would need to check that the calling thread is actually an AMPI process thread that's supposed to be running. We should also document that PUP routines for AMPI code can't call MPI routines.
On Thu, Nov 24, 2016 at 5:36 PM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:
|
- RE: [charm] Adaptive MPI, (continued)
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- Message not available
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- Message not available
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/24/2016
- Re: [charm] Adaptive MPI, Phil Miller, 11/25/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/25/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
- Message not available
- Re: [charm] Adaptive MPI, Sam White, 11/28/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/29/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/29/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
Archive powered by MHonArc 2.6.19.