charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Sameer Kumar <sameermanepalli AT gmail.com>
- To: Nitin Bhat <nitin AT hpccharm.com>
- Cc: "charm AT lists.cs.illinois.edu" <charm AT lists.cs.illinois.edu>
- Subject: Re: [charm] PAMI_Context_advance throws an error after PAMI_Rput call
- Date: Thu, 14 Sep 2017 05:15:23 -0400
- Authentication-results: illinois.edu; spf=pass smtp.mailfrom=sameermanepalli AT gmail.com
Sent from my iPhone
Hello,
I am getting an error while working with RDMA calls in the PAMI communication library on BG/Q Vesta and needed help on debugging it.
I get the error when I build charm with “./build charm++ pamilrts-bluegeneq --with-production –j16 –g”. Surprisingly, the error does not reproduce when I build charm with –O0 optimization “./build charm++ pamilrts-bluegeneq –j16 –O0 –g”.
Specifically, the crash is at PAMI_Context_advance, which is called sometime after calling PAMI_Rput. I have attached the job output and the stack trace that I obtained from bgqstack.
I see that the error occurs after the completion function executes (done_fn that I pass to the PAMI_Rput call).
Additionally, the stack trace reveals that the error occurs at /bgsys/source/srcV1R2M4.29840/comm/sys/buildtools/pami/p2p/protocols/rput/PutRdma.h:149, which is a call to the complete_simple() method, and the line which shows the error is
put->simple.done_fn (context, put->simple.cookie, PAMI_SUCCESS);
But I’m not sure why the invocation of the done function is throwing an error. I am not doing anything specific in the done_fn other than print out “completion fn beg” and “completion fn end” as seen in the job output.
Interestingly, things work just fine and I don’t see any crash at PAMI_Context_advance when my program uses PAMI_Rget (instead of PAMI_Rput).
Any pointers for debugging this error? Are there any restrictions for making calls to PAMI_Context_advance after Rput/Rget calls?
Thanks,
Nitin Bhat
Software Engineer,
Charmworks Inc.
<stack_trace.txt>
<job_output.txt>
- Re: [charm] PAMI_Context_advance throws an error after PAMI_Rput call, Sameer Kumar, 09/14/2017
Archive powered by MHonArc 2.6.19.