charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Robert Bird <r.bird AT warwick.ac.uk>
- To: charm AT cs.illinois.edu
- Subject: [charm] Process not consuming messages
- Date: Wed, 20 Aug 2014 18:02:48 -0600
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
Hi all,
--
I have an iterative code that can deadlock during parallel operation.
It seems that all Charms associated with a "node" (CkMyNode) stop getting scheduled.
Where @<number> denotes Chare array id, and (<number>) denotes the time-step, below we can see 440 not consuming the message it's sent.
CharmPatch.cpp:1473 @440 (21) distributeGhostCells >> done loop ghost send
CharmPatch.def.h:3295 @440 (21) _atomic_7 >> finished sending ghosts, waiting for 128 ghosts with tag (rg = 43)
CharmPatch.cpp:1255 @441 (21) distributeGhostCells >> Sending to 440 in direction 2 (d=43)
CharmPatch.cpp:1457 @107 (21) distributeGhostCells >> Sending to 440 in direction 3 (d=43)
CharmPatch.cpp:1255 @442 (21) distributeGhostCells >> Sending to 440 in direction 8 (d=43)
CharmPatch.cpp:1255 @434 (21) distributeGhostCells >> Sending to 440 in direction 12 (d=43)
CharmPatch.def.h:3295 @440 (21) _atomic_7 >> finished sending ghosts, waiting for 128 ghosts with tag (rg = 43)
CharmPatch.cpp:1255 @441 (21) distributeGhostCells >> Sending to 440 in direction 2 (d=43)
CharmPatch.cpp:1457 @107 (21) distributeGhostCells >> Sending to 440 in direction 3 (d=43)
CharmPatch.cpp:1255 @442 (21) distributeGhostCells >> Sending to 440 in direction 8 (d=43)
CharmPatch.cpp:1255 @434 (21) distributeGhostCells >> Sending to 440 in direction 12 (d=43)
I have grepped out all the chares that exhibit his behaviour, and on a per run basis the all map to the same ckMyNode()
The code it is waiting to consume those messages is the following SDAG:
while (receivedGhostsCount < totalGhosts)
{
when SDAGreceiveGhostCells[(step*2)+1]( int direction, CharmGhostBuffer ghost, int sender_id)
{
{
when SDAGreceiveGhostCells[(step*2)+1]( int direction, CharmGhostBuffer ghost, int sender_id)
{
// consume
}
}
Does any one have any ideas what may cause this?
The only thing I can think of is that another scheduled "node group" that shares the same physical mapping has stalled (perhaps in an infinite loop), stopping this getting scheduled? That being said however, the above is the only odd behaviour I have been able to find so far.
Best Regards,
Bob
Robert Bird
http://go.warwick.ac.uk/robertbird
+44 (0)24 7652 2863
CS202, High Performance Lab
Department of Computer Science
University of Warwick
http://go.warwick.ac.uk/robertbird
+44 (0)24 7652 2863
CS202, High Performance Lab
Department of Computer Science
University of Warwick
- [charm] Process not consuming messages, Robert Bird, 08/20/2014
- Re: [charm] Process not consuming messages, Phil Miller, 08/21/2014
Archive powered by MHonArc 2.6.16.