Thanks, Phil! Yeah, my statement about derived communicators was too broad, but in my app I do indeed use MPI_Comm_split to create communicators.
Rob
From: Phil Miller [mailto:unmobile AT gmail.com]
Sent: Tuesday, November 22, 2016 12:17 PM
To: Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart AT intel.com>; White, Samuel T <white67 AT illinois.edu>
Cc: Totoni, Ehsan <ehsan.totoni AT intel.com>; Langer, Akhil <akhil.langer AT intel.com>; Harshitha Menon <harshitha.menon AT gmail.com>; charm AT cs.uiuc.edu; Kavitha Chandrasekar <kchndrs2 AT illinois.edu>
Subject: RE: [charm] When to migrate
Sam should be better able to answer your exact query. Depending on what you need, In brief, that remark in the manual is specifically about MPI_Comm_split_type that's used to get a subcommunicator with physical commonality. It doesn't affect derived communicators
in general.
On Nov 22, 2016 2:03 PM, "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com> wrote:
Hello Kavitha,
I was just talking with Ehsan and Akhil about logistics of dynamic load balancing in Adaptive MPI applications, see below. Can you give me an update on the status of the meta-balancer?
Meanwhile, I ran into a funny issue with my application. I am using MPI_Comm_split to create multiple communicators. This is what I read in the Adaptive MPI manual:
Note that migrating ranks around the cores and nodes of a system can change which ranks share physical resources, such as memory. A consequence of this is that communicators
created via MPI_Comm_split_type are invalidated by calls to AMPI_Migrate that result in migration which breaks the semantics of that communicator type. The only valid routine to call on such communicators is MPI_Comm_free .
We also provide callbacks that user code can register with the runtime system to be invoked just before and right after migration: AMPI_Register_about_to_migrate and AMPI_Register_just_migrated
respectively. Note that the callbacks are only invoked on those ranks that are about to actually migrate or have just actually migrated.
So is the idea that before a migration I call MPI_Comm_free on derived communicators and reconstitute the communicators after the migration by reinvoking MPI_Comm_split?
Thanks!
Rob
The person working on it (Harshitha) has left recently and I don’t know who picked up the work. I suggest sending an email to the mailing list. Hopefully,
the meta-balancer is in a usable shape.
-Ehsan
Thanks, Ehsan! Indeed, my workload is iterative. The structure is as follows:
for (t=0; t<T; t++) {
if (iter%period<duration && criterion(my_rank) do extra work;
do regular work;
}
So whenever the time step is a multiple of the
period, some ranks (depending on the criterion function) start doing extra work for
duration steps. As you can see, there is a hierarchy in the iterative workload behavior.
Whom should I contact about the meta-balancer?
Thanks again!
Rob
Hi Rob,
If the workload is iterative, where in the iteration AMPI_Migrate() is called shouldn’t matter in principle for measurement-based load balancing. Of
course, there are tricky cases where this doesn’t work (few long variant iterations etc). There is also a meta-balancer that automatically decides how often load balancing should be invoked and which load balancer. I can’t find it in the manual so I suggest
sending them an email to make them document it J
Is your workload different than typical iterative applications?
Best,
Ehsan
p.s. MPI_Migrate() is renamed to AMPI_Migrate(). MPI_ prefix is not used anymore for AMPI-specific calls.
Hi Akhil and Ehsan,
I have a silly question. I put together a workload designed to test the capabilities of runtimes to do dynamic load balancing. It’s a very controlled
environment. For a while nothing happens to the load, but at discrete points in time I either remove work from or add work to an MPI rank (depending on the strategy chosen, this could be quite dramatic, such as a rank having no work to do at all for a while,
then it gets a chore, and after a while stops doing that chore again). I am adding PUP routines and migrate calls to the workload to test it using Adaptive MPI. The question is when I should invoke MPI_Migrate. Should I do it just before the load per rank
changes, or right after? Because the period during which I add or remove work from a rank could be short, this could make quite a difference. The workload is cyclic, so the runtime can learn, in principle, from load changes in the past.
Thanks for any advice you can offer!
Rob
|