charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Evghenii Gaburov <e-gaburov AT northwestern.edu>
- To: Gengbin Zheng <zhenggb AT gmail.com>
- Cc: Eric Bohm <ebohm AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: Re: [charm] [ppl] load balancer question (freeze/crash)
- Date: Tue, 4 Oct 2011 17:37:04 +0000
- Accept-language: en-US
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
> Make sure you call the parent when you overload those two functions,
> something like the following:
>
> void ckAboutToMigrate() { CBase_LB_Test::ckAboutToMigrate(); }
> void ckJustMigrated() { CBase_LB_Test::ckJustMigrated(); }
Okay, that solves the problem.
> For your production code, make sure you write pup functions that
> pack/unpack all class variables.
Even temporarily variables that are reconstructed during PUP process, and in
principle
do not require migration? Can this be cause for deadlocks if I do not PUP
them?
> Also look at possible race conditions in the code. For example, after
> calling AtSync() (assuming you are using periodic load balancing), the
> caller should not send new messages. It should wait for the resume
> from resumefromSync() call.
Okay, I will double check that.
Thanks,
Evghenii
>
> Gengbin
>
> On Tue, Oct 4, 2011 at 10:43 AM, Evghenii Gaburov
> <e-gaburov AT northwestern.edu>
> wrote:
>>> This program does not PUP the MainCB callback member variable
>>> Variables which are not PUP'd will not retain their value after
>>> migration. Therefore every migrated element will be calling an
>>> uninitialized callback in ResumeFromSync.
>> So, the freeze still occur even after MainCB is passed to PUP.
>>
>> The test program I posted in the previous listing
>> sometimes freezes with Greedy[Comm]LB, Refine[Comm]LB & MetisLB, but not
>> with RotateLB,
>>
>> when ckAboutToMigrate() & ckJustMigrated() are defined.
>>
>> #if 1
>> void ckAboutToMigrate() {}
>> void ckJustMigrated() {}
>> #endif
>>
>> Any idea what may happen here?
>>
>> While in my simulation code I do not use these, I still experience freezes
>> at ResumeFromSync()
>> after having the code run for about an hour and after a dozens of AtSync()
>> calls. I cannot reproduce
>> this behaviour in that simple test code, but may be this is related to the
>> fact that in production code
>> I move a lot of data...
>>
>> Any help will be of great value!
>>
>> Cheers,
>> Evghenii
>>
>>
>>
>> --
>> Evghenii Gaburov,
>> e-gaburov AT northwestern.edu
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> charm mailing list
>> charm AT cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>> _______________________________________________
>> ppl mailing list
>> ppl AT cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>>
--
Evghenii Gaburov,
e-gaburov AT northwestern.edu
- Re: [charm] [ppl] load balancer question (freeze/crash), Evghenii Gaburov, 10/04/2011
- <Possible follow-up(s)>
- Re: [charm] [ppl] load balancer question (freeze/crash), Evghenii Gaburov, 10/04/2011
- Re: [charm] [ppl] load balancer question (freeze/crash), Gengbin Zheng, 10/04/2011
- Re: [charm] [ppl] load balancer question (freeze/crash), Evghenii Gaburov, 10/04/2011
- Re: [charm] [ppl] load balancer question (freeze/crash), Pritish Jetley, 10/04/2011
- Re: [charm] [ppl] load balancer question (freeze/crash), Evghenii Gaburov, 10/04/2011
- Re: [charm] [ppl] load balancer question (freeze/crash), Pritish Jetley, 10/04/2011
- Re: [charm] [ppl] load balancer question (freeze/crash), Phil Miller, 10/04/2011
- Re: [charm] [ppl] load balancer question (freeze/crash), Evghenii Gaburov, 10/04/2011
- Re: [charm] [ppl] load balancer question (freeze/crash), Gengbin Zheng, 10/04/2011
Archive powered by MHonArc 2.6.16.