charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com>
- To: Phil Miller <mille121 AT illinois.edu>
- Cc: Sam White <white67 AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: RE: [charm] Adaptive MPI
- Date: Tue, 29 Nov 2016 00:07:47 +0000
- Accept-language: en-US
No luck. The same type of string processing error occurs at some point when trying to read the key itself, see below.
rfvander@klondike:~/charm-6.7.1/examples/ampi/Cjacobi3D$ $HOME/charm-6.7.1/bin/charmrun ./jacobi 2 2 2 1000 +p 2 +vp 8 +isomalloc_sync +balancer RotateLB +LBDebug 1 Running command: ./jacobi 2 2 2 1000 +p 2 +vp 8 +isomalloc_sync +balancer RotateLB +LBDebug 1
Charm++: standalone mode (not using charmrun) Charm++> Running in Multicore mode: 2 threads Converse/Charm++ Commit ID: Warning> Randomization of stack pointer is turned on in kernel. Charm++> synchronizing isomalloc memory region... [0] consolidated Isomalloc memory region: 0x440000000 - 0x7f1a00000000 (133258240 megs) CharmLB> Verbose level 1, load balancing period: 0.5 seconds CharmLB> Load balancer assumes all CPUs are same. Charm++> Running on 1 unique compute nodes (16-way SMP). Charm++> cpu topology info is gathered in 0.000 seconds. [0] RotateLB created iter 1 time: 0.079971 maxerr: 2020.200000 iter 2 time: 0.059791 maxerr: 1696.968000 iter 3 time: 0.050566 maxerr: 1477.170240 iter 4 time: 0.046094 maxerr: 1319.433024 iter 5 time: 0.045918 maxerr: 1200.918072 iter 6 time: 0.045842 maxerr: 1108.425519 iter 7 time: 0.045895 maxerr: 1033.970839 iter 8 time: 0.045871 maxerr: 972.509242 iter 9 time: 0.045872 maxerr: 920.721889 iter 10 time: 0.045870 maxerr: 876.344030
CharmLB> RotateLB: PE [0] step 0 starting at 0.758304 Memory: 72.253906 MB CharmLB> RotateLB: PE [0] strategy starting at 0.758354 CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB CharmLB> RotateLB: PE [0] strategy finished at 0.758360 duration 0.000006 s CharmLB> RotateLB: PE [0] step 0 finished at 0.786232 duration 0.027928 s
iter 11 time: 0.063298 maxerr: 837.779089 iter 12 time: 0.045806 maxerr: 803.868831 iter 13 time: 0.045729 maxerr: 773.751705 iter 14 time: 0.045843 maxerr: 746.772667 iter 15 time: 0.045770 maxerr: 722.424056 iter 16 time: 0.045805 maxerr: 700.305763 iter 17 time: 0.045858 maxerr: 680.097726 iter 18 time: 0.045809 maxerr: 661.540528 iter 19 time: 0.044910 maxerr: 644.421422 iter 20 time: 0.041548 maxerr: 628.564089 iter 21 time: 0.040014 maxerr: 613.821009 iter 22 time: 0.039945 maxerr: 600.067696 iter 23 time: 0.039926 maxerr: 587.198273 iter 24 time: 0.039924 maxerr: 575.122054 iter 25 time: 0.039885 maxerr: 563.760848 iter 26 time: 0.040128 maxerr: 553.046836 iter 27 time: 0.040071 maxerr: 542.920870 iter 28 time: 0.039904 maxerr: 533.331094 iter 29 time: 0.039919 maxerr: 524.231833 iter 30 time: 0.039921 maxerr: 515.582675
CharmLB> RotateLB: PE [0] step 1 starting at 1.648019 Memory: 75.172928 MB CharmLB> RotateLB: PE [0] strategy starting at 1.648106 CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB CharmLB> RotateLB: PE [0] strategy finished at 1.648112 duration 0.000006 s CharmLB> RotateLB: PE [0] step 1 finished at 1.665523 duration 0.017504 s
iter 31 time: 0.050692 maxerr: 507.347718 iter 32 time: 0.040078 maxerr: 499.494943 iter 33 time: 0.040256 maxerr: 491.995690 iter 34 time: 0.040043 maxerr: 484.824219 iter 35 time: 0.040006 maxerr: 477.957338 iter 36 time: 0.040048 maxerr: 471.374089 iter 37 time: 0.040035 maxerr: 465.055477 iter 38 time: 0.040001 maxerr: 458.984241 iter 39 time: 0.040005 maxerr: 453.144656 iter 40 time: 0.040110 maxerr: 447.522361 iter 41 time: 0.040379 maxerr: 442.104210 iter 42 time: 0.040126 maxerr: 436.878145 iter 43 time: 0.040149 maxerr: 431.833082 iter 44 time: 0.040228 maxerr: 426.958810 iter 45 time: 0.040168 maxerr: 422.245909 iter 46 time: 0.040041 maxerr: 417.685669 iter 47 time: 0.040055 maxerr: 413.270025 iter 48 time: 0.040096 maxerr: 408.991494 iter 49 time: 0.039997 maxerr: 404.843126 iter 50 time: 0.040021 maxerr: 400.818454
CharmLB> RotateLB: PE [0] step 2 starting at 2.476987 Memory: 75.238968 MB CharmLB> RotateLB: PE [0] strategy starting at 2.477029 CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB CharmLB> RotateLB: PE [0] strategy finished at 2.477035 duration 0.000006 s CharmLB> RotateLB: PE [0] step 2 finished at 2.493661 duration 0.016674 s
iter 51 time: 0.050363 maxerr: 396.911452 iter 52 time: 0.040102 maxerr: 393.116496 iter 53 time: 0.039939 maxerr: 389.428332 iter 54 time: 0.039998 maxerr: 385.842045 iter 55 time: 0.040045 maxerr: 382.353031 iter 56 time: 0.040046 maxerr: 378.956970 iter 57 time: 0.040027 maxerr: 375.649808 iter 58 time: 0.039957 maxerr: 372.427733 iter 59 time: 0.040017 maxerr: 369.287159 iter 60 time: 0.040044 maxerr: 366.224708 iter 61 time: 0.040012 maxerr: 363.237194 iter 62 time: 0.039956 maxerr: 360.321610 iter 63 time: 0.039989 maxerr: 357.475116 iter 64 time: 0.040022 maxerr: 354.695025 iter 65 time: 0.039989 maxerr: 351.978797 iter 66 time: 0.040025 maxerr: 349.324022 iter 67 time: 0.039996 maxerr: 346.728419 iter 68 time: 0.039968 maxerr: 344.189822 iter 69 time: 0.040082 maxerr: 341.706174 iter 70 time: 0.040181 maxerr: 339.275521
CharmLB> RotateLB: PE [0] step 3 starting at 3.302705 Memory: 75.305084 MB CharmLB> RotateLB: PE [0] strategy starting at 3.302795 CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB CharmLB> RotateLB: PE [0] strategy finished at 3.302802 duration 0.000007 s CharmLB> RotateLB: PE [0] step 3 finished at 3.318951 duration 0.016246 s
iter 71 time: 0.049915 maxerr: 336.896006 iter 72 time: 0.040021 maxerr: 334.565860 iter 73 time: 0.040179 maxerr: 332.283400 iter 74 time: 0.040051 maxerr: 330.047020 iter 75 time: 0.040005 maxerr: 327.855193 iter 76 time: 0.040029 maxerr: 325.706456 iter 77 time: 0.040045 maxerr: 323.599418 iter 78 time: 0.040035 maxerr: 321.532746 iter 79 time: 0.040319 maxerr: 319.505169 iter 80 time: 0.040152 maxerr: 317.515469 iter 81 time: 0.040000 maxerr: 315.562481 iter 82 time: 0.040090 maxerr: 313.645090 iter 83 time: 0.040004 maxerr: 311.762228 iter 84 time: 0.040049 maxerr: 309.912871 iter 85 time: 0.040071 maxerr: 308.096037 iter 86 time: 0.039998 maxerr: 306.310783 iter 87 time: 0.040066 maxerr: 304.556206 iter 88 time: 0.039985 maxerr: 302.831437 iter 89 time: 0.040058 maxerr: 301.135641 iter 90 time: 0.040069 maxerr: 299.468016 WARNING: Unknown MPI_Info key given to AMPI_Migrate: ampi_load_balanceÿÿÿÿÿÿÿ% From: Van Der Wijngaart, Rob F
For now I am overriding the load balancer test in the code that reads its key value and am just executing TCHARM_Migrate() whenever the key is found, regardless of its value. Keep fingers crossed.
From: Van Der Wijngaart, Rob F
Hi Phil,
So far I had been using charm6.7.0, but I started to notice errors that appeared to be caused by the migration routines in AMPI, so I tried out the new version, 6.7.1. The way the load balancing hints are read appears corrupted. Please see below for a run with an example from examples/ampi/Cjacobi3D. The first time the value of the load balancer key is read it is correct, but all subsequent times when it is actually used, the library attaches a random character. I inserted the debug line: key 0 equals ampi_load_balance with value sync
Rob
rfvander@klondike:~/charm-6.7.1/examples/ampi/Cjacobi3D$ $HOME/charm-6.7.1/bin/charmrun ./jacobi 2 2 2 30 +p 2 +vp 8 +isomalloc_sync +balancer RotateLB +LBDebug 1 Running command: ./jacobi 2 2 2 30 +p 2 +vp 8 +isomalloc_sync +balancer RotateLB +LBDebug 1
Charm++: standalone mode (not using charmrun) Charm++> Running in Multicore mode: 2 threads Converse/Charm++ Commit ID: Warning> Randomization of stack pointer is turned on in kernel. Charm++> synchronizing isomalloc memory region... [0] consolidated Isomalloc memory region: 0x440000000 - 0x7f5d00000000 (133532672 megs) CharmLB> Verbose level 1, load balancing period: 0.5 seconds CharmLB> Load balancer assumes all CPUs are same. Charm++> Running on 1 unique compute nodes (16-way SMP). Charm++> cpu topology info is gathered in 0.000 seconds. [0] RotateLB created iter 1 time: 0.078998 maxerr: 2020.200000 iter 2 time: 0.059326 maxerr: 1696.968000 iter 3 time: 0.050306 maxerr: 1477.170240 iter 4 time: 0.045964 maxerr: 1319.433024 iter 5 time: 0.045959 maxerr: 1200.918072 iter 6 time: 0.045985 maxerr: 1108.425519 iter 7 time: 0.045932 maxerr: 1033.970839 iter 8 time: 0.045992 maxerr: 972.509242 iter 9 time: 0.045941 maxerr: 920.721889 iter 10 time: 0.045945 maxerr: 876.344030 key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync key 0 equals ampi_load_balance with value sync
CharmLB> RotateLB: PE [0] step 0 starting at 0.853504 Memory: 72.253906 MB CharmLB> RotateLB: PE [0] strategy starting at 0.853559 CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 3 KB CharmLB> RotateLB: PE [0] #Objects migrating: 8, LBMigrateMsg size: 0.00 MB CharmLB> RotateLB: PE [0] strategy finished at 0.853564 duration 0.000005 s CharmLB> RotateLB: PE [0] step 0 finished at 0.882196 duration 0.028692 s
iter 11 time: 0.063316 maxerr: 837.779089 iter 12 time: 0.046134 maxerr: 803.868831 iter 13 time: 0.046079 maxerr: 773.751705 iter 14 time: 0.046063 maxerr: 746.772667 iter 15 time: 0.046088 maxerr: 722.424056 iter 16 time: 0.046083 maxerr: 700.305763 iter 17 time: 0.046087 maxerr: 680.097726 iter 18 time: 0.046047 maxerr: 661.540528 iter 19 time: 0.044149 maxerr: 644.421422 iter 20 time: 0.040968 maxerr: 628.564089 iter 21 time: 0.040264 maxerr: 613.821009 iter 22 time: 0.040429 maxerr: 600.067696 iter 23 time: 0.040471 maxerr: 587.198273 iter 24 time: 0.040278 maxerr: 575.122054 iter 25 time: 0.040325 maxerr: 563.760848 iter 26 time: 0.040425 maxerr: 553.046836 iter 27 time: 0.040186 maxerr: 542.920870 iter 28 time: 0.040066 maxerr: 533.331094 iter 29 time: 0.040020 maxerr: 524.231833 key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance iter 30 time: 0.040080 maxerr: 515.582675 key 0 equals ampi_load_balance with value synca WARNING: Unknown MPI_Info value (synca) given to AMPI_Migrate for key: ampi_load_balance [Partition 0][Node 0] End of program
From:
unmobile AT gmail.com [mailto:unmobile AT gmail.com]
On Behalf Of Phil Miller
Sam: It seems like it should be straightforward to add an assertion in our API entry/exit tracking sentries to catch this kind of issue. Essentially, it would need to check that the calling thread is actually an AMPI process thread that's supposed to be running. We should also document that PUP routines for AMPI code can't call MPI routines.
On Thu, Nov 24, 2016 at 5:36 PM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com> wrote:
|
- RE: [charm] Adaptive MPI, (continued)
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/24/2016
- Re: [charm] Adaptive MPI, Phil Miller, 11/25/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/25/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
- Message not available
- Re: [charm] Adaptive MPI, Sam White, 11/28/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/29/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/29/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/28/2016
Archive powered by MHonArc 2.6.19.