charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Robert Steinke <rsteinke AT uwyo.edu>
- To: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: [charm] intermittent hang on reduction
- Date: Wed, 18 Feb 2015 11:39:28 -0700
- Authentication-results: cs.uiuc.edu; dkim=none (message not signed) header.d=none;
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
I've been having an intermittent problem where my code hangs. I've traced down where it is happening. The problem is with a reduction. Every array element calls contribute, but the reduction target never gets called. It's intermittent, but when it happens it always happens on the first time the reduction is performed after load balancing. It happens more often on larger numbers of processors.
What is the best way to debug this? Can I look at what the Charm code thinks the state of the reduction is such as how many elements have contributed and how many are expected to contribute? Is there a trace level option I should use, or is there somewhere in the .def.h code where I should stick a breakpoint?
thanks,
Bob Steinke
- [charm] intermittent hang on reduction, Robert Steinke, 02/18/2015
Archive powered by MHonArc 2.6.16.