charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Marcin Mielniczuk <marmistrz.dev AT zoho.eu>
- To: Sam White <white67 AT illinois.edu>
- Cc: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
- Subject: Re: [charm] How to verify that AMPI load balancing works?
- Date: Mon, 3 Jun 2019 17:54:23 +0200
- Arc-authentication-results: i=1; mx.zohomail.eu; dkim=pass header.i=zoho.eu; spf=pass smtp.mailfrom=marmistrz.dev AT zoho.eu; dmarc=pass header.from=<marmistrz.dev AT zoho.eu> header.from=<marmistrz.dev AT zoho.eu>
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.eu; s=zohoarc; t=1559577265; h=Content-Type:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To:ARC-Authentication-Results; bh=HwHrsUaaG1y43ezkWdGu4w28FIi4G1eTjsDI0RdH4QI=; b=QT3He0Qm8wAh8UDUGuy05u0Z4oNV8JCqGsAiHMzEnrhdWUCLqjAVjhCv8whh9Qjk9zmbUHFPgPJv3gdnZwrBm9PA68KCVLm3HOG9Y0gOfpk9KeYjOoiGiQZ/iv/KSdc0RWg4vl3LA+JWZ/N5Z/9xbJImDIVRnopbwf24PJiSdIc=
- Arc-seal: i=1; a=rsa-sha256; t=1559577265; cv=none; d=zohomail.eu; s=zohoarc; b=cUAP4HcRi77j+Ddob4HoX1RFaBoR1BBXkoVPA+1hTfxSnYpppFCAV9tLL30IOsSlbG/J8fsDx01kDFdjDEFhHv46Rjm2o2yRGyKawBAfeJAK/BId1Fnoy1uE9WomNnaaJ+vhvvLr8epMb1U7IFTRWBk+sZrlbiXhWbERIhlbnnE=
- Authentication-results: illinois.edu; spf=pass smtp.mailfrom=marmistrz.dev AT zoho.eu; dkim=pass header.d=zoho.eu header.s=admin; dmarc=pass header.from=zoho.eu
Hi Sam, I understand that AMPI_Register_just_migrated is the proper way to do 4? If so, then I confirm that the migration only happens once, on the first call to AMPI_Migrate. I do call AMPI_Migrate multiple times. This can be seen looking at the "trying to migrate" lines in the stdout and the following lines in the source code: https://github.com/marmistrz/heat_solver/blob/master/main.cpp#L286-L291 Regards, On 03.06.2019 16:15, Sam White wrote:
Hi Marcin,
This is how recommend enabling and testing dynamic load balancing in an AMPI program: 1. Insert periodic calls to AMPI_Migrate(...) with the MPI_Info for LB. 2. Link with "-memory isomalloc -module CommonLBs". 3. First run with "+balancer RotateLB +LBDebug 3". 4. Verify that multiple rounds of migrations are happening (every rank is migrating at each call to AMPI_Migrate(). The AMPI manual has info on how to print the current PE # that a rank is on. Then you can experiment with other load balancing strategies and options. You should not make calls to AMPI_Register_pup() when using Isomalloc for migration. Isomalloc is essentially a substitute for writing explicit PUP routines at the application level. For your issue, are you sure that you are calling AMPI_Migrate() more than once? When running with +LBDebug the LB strategy will print some info as soon as it is called each time it is called, so the output in your log file suggests that it's not being called more than once for some reason. You may also want to run with +LBTestPeSpeed so that the LB framework takes into consideration the heterogeneity of the nodes you are running on. Let me know if this helps or if you still see migration only happening once. We will improve the manual based on your feedback, so thanks for getting in touch with us! -Sam On Mon, Jun 3, 2019 at 8:45 AM
Marcin Mielniczuk <marmistrz.dev AT zoho.eu> wrote:
Hi,I'm evaluating possible options for distributed computing in LAN networks and came across AMPI. Currently I'm trying to get some hands-on experience. I have a toy project to test on, which I have ported to AMPI. [1] According to the AMPI documentation, no manual PUP routines should be needed when using isomalloc, so I have only added an AMPI_Migrate call and an AMPI_Register_just_migrated handler. I'm not sure if it's correct, because all the AMPI examples seem to have manual PUP routines, even if using isomalloc. I'm trying to verify if the processes are actually being migrated. It appears that the just_migrated handler is never called, moreover, when running with +LBDebug, the only CharmLB logs refer to the first call of AMPI_Migrate. It looks like the LB doesn't even consider the migration later on. While for GreedyLB the load balancer may have decided that the load imbalance isn't large enough, I have also tried RotateLB, which should always migrate and it appears not to. The behavior persists even if I add extra artificial CPU load on one of the nodes, which should cause a large load imbalance. All in all, it looks like AMPI doesn't migrate any process even when run with a load balancer. My command line is: ./charmrun +p12 -hostfile hostfile --mca btl_tcp_if_include <LAN subnet> ./heat_solver --size 14400 --steps 200 --noresults +vp60 +balancer RotateLB +LBDebug 100 My setup is: 2 computers in a common LAN (Ethernet) network, without a shared file system. One node has 4 CPUs, the other has 8 CPUs. The CPUs differ between the nodes: one is Intel Core i7-6700 (4GHz), the other AMD Ryzen 7 1700 Eight-Core (3GHz) I have attached execution logs. Is the lack of migration just a programming error on my side or is it an AMPI bug? Regards, Marcin [1] https://github.com/marmistrz/heat_solver |
- [charm] How to verify that AMPI load balancing works?, Marcin Mielniczuk, 06/03/2019
- <Possible follow-up(s)>
- Re: [charm] How to verify that AMPI load balancing works?, Sam White, 06/03/2019
- Re: [charm] How to verify that AMPI load balancing works?, Marcin Mielniczuk, 06/03/2019
- Message not available
- Re: [charm] How to verify that AMPI load balancing works?, Sam White, 06/03/2019
- Re: [charm] How to verify that AMPI load balancing works?, Marcin Mielniczuk, 06/03/2019
- Re: [charm] How to verify that AMPI load balancing works?, Sam White, 06/03/2019
Archive powered by MHonArc 2.6.19.