charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Robert Steinke <rsteinke AT uwyo.edu>
- To: <charm AT cs.uiuc.edu>
- Subject: Re: [charm] messages not being received
- Date: Mon, 6 Oct 2014 17:08:36 -0600
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
I've been working on my problem where I send messages to an entire chare array and some messages don't arrive.
I've been trying to create a minimal example that exhibits the problem. I've gotten down to about 2000 lines of code. I can't see any bugs in my code. Would anyone be willing to take a look at it or try to debug it on your system?
I am running on CentOS6 with the newly released Charm 6.6.0. The build is mpi-linux-x86_64, and the MPI is mpich-3.0.1. The problem shows up when I run on only one process element. I haven't tried it on more.
The problem depends on a neighbor graph that is read in from a file. At the start, each chare initializes itself and then sends an initialization message to its neighbors. These messages all arrive, but when I try to send a subsequent message to all elements of the chare array some elements don't receive it. If I use hardcoded neighbor relationships like each element is connected to the ones numerically before and after it the problem doesn't occur. But when I use the neighbor graph that I want to use from the file the problem occurs. The problem is not caused by reading from the file. I can read the file and then overwrite the neighbor values with hardcoded ones and the problem doesn't occur.
I've attached the code, but the file with the neighbor relationships is a 6GB netCDF file. I can send it to whoever is willing to work on the problem. You will need to have the NetCDF library to link against my code.
Thanks,
Bob Steinke
On 10/03/2014 03:46 PM, Robert Steinke wrote:
I'm having a problem with my charm application.
Before I get into the problem, I tried to use the ccs_tools charm debugger, but haven't been able to yet. I read in the manual that it only works for net-* versions of charm, and I am running on an mpi-* version. The process of getting my code to run on a net-* version started to turn into a real mess. For example I'm using the parallel version of the NetCDF library that requires MPI. I could probably get it running on a net-* version, but my first question is whether that's the right road to be going down. Is it likely the ccs_tools debugger will be useful for solving this problem, or is there something else I can do?
Here's the problem:
In an entry method of one object I have a loop that sends out messages to every element of a chare array. I'm sending an individual message to each object in a loop, not a broadcast through the array proxy, because I need to send different parameters to each object. Like this:
for (ii = 0; ii < proxySize; ii++)
{
proxy[ii].message(parameters[ii]);
}
When proxySize is large and I send a lot of messages (about 37,000) a couple percent of them never arrive. The missing messages are scattered around the array. When I send a small number of messages they all arrive.
Has anyone experienced something like this before?
I was hoping that the ccs_tools debugger would be able to do things like show me the queued messages so I can see messages being sent and received so I can tell if this is really a problem with charm not delivering messages or if I'm doing something wrong. Is this something that ccs_tools could show me?
Thanks for the help,
Bob Steinke
_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm
Attachment:
charm_error.tar.gz
Description: GNU Zip compressed data
- [charm] messages not being received, Robert Steinke, 10/03/2014
- Re: [charm] messages not being received, Kale, Laxmikant V, 10/03/2014
- Re: [charm] messages not being received, Robert Steinke, 10/06/2014
- Re: [charm] messages not being received, Lukasz Wesolowski, 10/06/2014
- Re: [charm] messages not being received, Robert Steinke, 10/07/2014
- Re: [charm] messages not being received, Lukasz Wesolowski, 10/07/2014
- Re: [charm] messages not being received, Robert Steinke, 10/07/2014
- Re: [charm] messages not being received, Lukasz Wesolowski, 10/06/2014
Archive powered by MHonArc 2.6.16.