charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Phil Miller <mille121 AT illinois.edu>
- To: Robert Steinke <rsteinke AT uwyo.edu>, Charm Mailing List <charm AT cs.illinois.edu>
- Subject: Re: [charm] Program hang when using load balancing and lots of PEs
- Date: Tue, 27 Jan 2015 17:35:33 -0600
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
Thanks for the output.
How often are your objects calling AtSync()?
Could you double-check the correctness and completeness of your objects' pup() routines? In particular, their own members, CBase_foo::pup(p), and (if applicable) __sdag_pup(p)? If something turns out to be missing, that can easily cause a hang.How often are your objects calling AtSync()?
On Tue, Jan 27, 2015 at 5:29 PM, Robert Steinke <rsteinke AT uwyo.edu> wrote:
I'm attaching a file from a run with +LBDebug 3. Within a few minutes it had gotten to the line:
currentTime = 52.970963, dt = 18.112661, iteration = 7.
Then it didn't do anything more. 30 minutes later I killed the job, and all of the output after that came after I killed the job.
There are two chare arrays. One has 36452 elements, and the other has 11832 elements.
On 01/27/2015 03:14 PM, Phil Miller wrote:
The first thing to try would be running with the option "+LBDebug 3" to get some visibility into what's happening in the LB infrastructure. Could you send us output from such a run?
Also, how many objects are you running with across the whole job?
On Tue, Jan 27, 2015 at 3:51 PM, Robert Steinke <rsteinke AT uwyo.edu> wrote:
I have a program that hangs when I run on lots of PEs and use the load balancer (I'm using MetisLB). If I run on 512 or fewer processors it is fine. If I try to run on 1024 processors it hangs shortly after I call CkStartLB (I'm using TurnManualLBOn). Also, if I don't call CkStartLB(); it runs fine on 1024 processors.
Is this a problem that someone else has encountered before?
Is this something that I should try to dig into, or is there someone else more familiar with the load balancer than I am who is willing to look into it, in which case I will apply my effort to creating a minimal test case that reproduces the problem.
Thanks
Bob Steinke
_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm
- [charm] Program hang when using load balancing and lots of PEs, Robert Steinke, 01/27/2015
- Re: [charm] Program hang when using load balancing and lots of PEs, Phil Miller, 01/27/2015
- Message not available
- Re: [charm] Program hang when using load balancing and lots of PEs, Phil Miller, 01/27/2015
- Message not available
- Re: [charm] Program hang when using load balancing and lots of PEs, Phil Miller, 01/27/2015
Archive powered by MHonArc 2.6.16.