charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Christian Perez <christian.perez AT inria.fr>
- To: gzheng AT illinois.edu
- Cc: Filippo Gioachin <gioachin AT uiuc.edu>, "kale AT illinois.edu" <kale AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>, "Kale, Laxmikant V" <kale AT cs.uiuc.edu>, Gengbin Zheng <zhenggb AT gmail.com>
- Subject: Re: [charm] how to suspend/resume a chare ?
- Date: Mon, 31 May 2010 13:53:04 +0200
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
- Organization: INRIA/LIP
Yes, removing the virtual base class fixes the problem so far. Thanks!
Quite complex component assembly examples are now working with
primitive components in Charm++!
Christian
On 05/28/2010 04:57 PM, Gengbin Zheng wrote:
ok, the virtual base class seems to be a problem here. If you remove
it, it works.
This exposes a bug in charm on how a plain chare is created and
registered in the chare table. I checked in a fix now, please try it
and let us know if it fixes the problem.
Gengbin
On Fri, May 28, 2010 at 4:03 AM, Christian Perez
<christian.perez AT inria.fr>
wrote:
Here is a simplified test program that bugs (line 31) while the version
with line 30 on works (with one process).
After compiling, it may happen that the 1st run works ... so it may be a
race condition.
Running it with 2 processes seems to be ok on my machine.
Christian
On 05/28/2010 04:03 AM, Gengbin Zheng wrote:
I wrote a simple test program passing either thishandle, or CProxy, it
seems to work both ways.
I think it would be best if you can write a simple standalone program
that can reproduce your problem and send to us for investigation.
Gengbin
On Thu, May 27, 2010 at 8:51 AM, Christian Perez
<christian.perez AT inria.fr>
wrote:
But, why does it work if I use the parameter 'psi'?
If I remove the 'sync', there is a synchronization issue :(
Christian
On 05/26/2010 09:54 PM, Gengbin Zheng wrote:
remove [sync] from your ci file. Calling sync will block the caller,
which in your case is not a threaded entry method.
Gengbin
On Wed, May 26, 2010 at 10:29 AM, Christian Perez
<christian.perez AT inria.fr>
wrote:
On 05/26/2010 05:00 PM, Gengbin Zheng wrote:
On Wed, May 26, 2010 at 6:37 AM, Christian PerezI produce this bug with this piece of code, with only one process.
<christian.perez AT inria.fr>
wrote:
On 05/25/2010 07:22 PM, Gengbin Zheng wrote:It is hard to eyeball errors from segmented code.
Is the caller of connect() a thread (I mean is connect() a threadedIt is not a threaded charm method: when I tried to turn it into a
entry function)?
[threaded]
entry
method it I've got an error when invoking JNI method from this chare.
I
shall investigate it later.
another way of doing this without using thread is to sendThat is the alternative solution: the good news is that it works(*),
component_B
proxy to component_A, and let component_A directly send its proxy to
component_B.
the not-so-good news is that it make the mapping between components
and charm objects a litte more complex.
(*) I need to use a hack that seems weird to me:
Chare component_A : SimpleInterface {
void set_s(CProxy_SetSimpleInterface& pssi,
CProxy_SimpleInterface&
psi) {
pssi.set_si(psi);
}
works while the following version does not work (psi is changed to
thishandle)
Chare component_A : SimpleInterface {
void set_s(CProxy_SetSimpleInterface& pssi,
CProxy_SimpleInterface&
psi) {
pssi.set_si(thishandle);
}
psi is obtained from a call to CProxy_component_A::ckNew.
Question: how can I create a Proxy from within a chare?
I tried
pssi.set_si(CProxy_SimpleInterface(thishandle));
but it fails with the same error :
[0] Assertion "n<len" failed in file cklists.h line 221.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason:
[0] Stack Traceback:
[0:0] CmiAbort+0x89 [0x506ff2]
[0:1] __cmi_assert+0x4b [0x514a0c]
[0:2] _ZN5CkVecIPvEixEm+0x36 [0x49c7a8]
[0:3] /opt/usr/stow/ULCMi/libexec/Charm-launcher [0x4966f9]
[0:4] _Z15_processHandlerPvP11CkCoreState+0x30d [0x49712e]
[0:5] CmiHandleMessage+0x84 [0x50f5f1]
[0:6] CsdSchedulePoll+0x96 [0x50fb5f]
[0:7] CsdScheduler+0x23 [0x50f8bf]
[0:8] CthStandinCode+0xe [0x50fc5a]
[0:9] CthStartThread+0x59 [0x4830b7]
[0:10] /lib/libc.so.6 [0x7f98d9aaf370]
Fatal error on PE 0>
Thanks for your help!
What is in set_si?
ci file:
chare SingleProvider : SimpleInterface {
...
entry [sync] void provider_set_s(CProxy_SetSimpleInterface&
pssi,
CProxy_SimpleInterface& psi, int n, char name[n]);
...
}
C file:
class SingleProvider : virtual public CBase_SingleProvider {
...
void provider_set_s(CProxy_SetSimpleInterface& pssi,
CProxy_SimpleInterface& psi, int n, char* name) {
pssi.ulcmi_set_si(CProxy_SimpleInterface(thishandle), n, name);
}
It also fails if I use thishandle instead of
CProxy_SimpleInterface(thishandle).
It works fine if I use pssi (which as been with ckNew of
CProxy_SingleProvider).
| Can you build your program with -g, when it crashes again, get stack
trace
like:
Here the output within gdb:
[0] Assertion "n<len" failed in file cklists.h line 221.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason:
[0] Stack Traceback:
[0:0] CmiAbort+0x89 [0x506f02]
[0:1] __cmi_assert+0x4b [0x51491c]
[0:2] _ZN5CkVecIPvEixEm+0x36 [0x49c6b8]
[0:3] /opt/usr/stow/ULCMi/libexec/Charm-launcher [0x496609]
[0:4] _Z15_processHandlerPvP11CkCoreState+0x30d [0x49703e]
[0:5] CmiHandleMessage+0x84 [0x50f501]
[0:6] CsdSchedulePoll+0x96 [0x50fa6f]
[0:7] CsdScheduler+0x23 [0x50f7cf]
[0:8] CthStandinCode+0xe [0x50fb6a]
[0:9] CthStartThread+0x59 [0x482fc7]
[0:10] /lib/libc.so.6 [0x7ffff5729370]
CHARM++ FATAL ERROR:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000506f31 in CmiAbort (message=0x53d785 "") at machine.c:612
612 *(int *)NULL = 0; /*Write to null, causing bus error*/
(gdb) bt
#0 0x0000000000506f31 in CmiAbort (message=0x53d785 "") at
machine.c:612
#1 0x000000000051491c in __cmi_assert (expr=0x534e87 "n<len",
file=0x534e7d "cklists.h", line=221) at convcore.c:3399
#2 0x000000000049c6b8 in CkVec<void*>::operator[]
(this=0x7ffff0067aa0,
n=140737226103192) at cklists.h:221
#3 0x0000000000496609 in _processForPlainChareMsg (ck=0x7ffff006c640,
env=0x7ffff007e3e8) at ck.C:844
#4 0x000000000049703e in _processHandler (converseMsg=0x7ffff007e3e8,
ck=0x7ffff006c640) at ck.C:1117
#5 0x000000000050f501 in CmiHandleMessage (msg=0x7ffff007e3e8)
at convcore.c:1393
#6 0x000000000050fa6f in CsdSchedulePoll () at convcore.c:1610
#7 0x000000000050f7cf in CsdScheduler (maxmsgs=0) at convcore.c:1491
#8 0x000000000050fb6a in CthStandinCode () at convcore.c:1674
#9 0x0000000000482fc7 in CthStartThread (fn1=0, fn2=5307228, arg1=0,
arg2=0)
at threads.c:1579
#10 0x00007ffff5729370 in ?? () from /lib/libc.so.6
#11 0x0000000000000000 in ?? ()
addr2line -e ./your_binay 0x4966f9
Can you send me the output of that for lines:addr2line -e /opt/usr/stow/ULCMi/libexec/Charm-launcher 0x49c6b8
[0:2] _ZN5CkVecIPvEixEm+0x36 [0x49c7a8]
[0:3] /opt/usr/stow/ULCMi/libexec/Charm-launcher [0x4966f9]
/home/cperez/Research/charm-6.2.0/net-linux-x86_64-smp/tmp/cklists.h:222
addr2line -e /opt/usr/stow/ULCMi/libexec/Charm-launcher 0x496609
/home/cperez/Research/charm-6.2.0/net-linux-x86_64-smp/tmp/ck.C:844
btw, did you ever run megatest (in charm/tests/charm++/megatest)I run it up to "++p 16" and it works fine.
successfully? Just wanted to make sure charm threaded entry method
indeed works on your system.
My system is a linux box (Intel(R) Core(TM)2 Duo CPU) running
debian/unstable (gcc (Debian 4.4.4-2) 4.4.4).
Thank you
Christian
- Re: [charm] how to suspend/resume a chare ?, (continued)
- Re: [charm] how to suspend/resume a chare ?, Christian Perez, 05/25/2010
- Re: [charm] how to suspend/resume a chare ?, Gengbin Zheng, 05/25/2010
- Re: [charm] how to suspend/resume a chare ?, Christian Perez, 05/26/2010
- Re: [charm] how to suspend/resume a chare ?, Gengbin Zheng, 05/26/2010
- Re: [charm] how to suspend/resume a chare ?, Christian Perez, 05/26/2010
- Re: [charm] how to suspend/resume a chare ?, Gengbin Zheng, 05/26/2010
- Re: [charm] how to suspend/resume a chare ?, Christian Perez, 05/27/2010
- Re: [charm] how to suspend/resume a chare ?, Gengbin Zheng, 05/27/2010
- Re: [charm] how to suspend/resume a chare ?, Christian Perez, 05/28/2010
- Re: [charm] how to suspend/resume a chare ?, Gengbin Zheng, 05/28/2010
- Re: [charm] how to suspend/resume a chare ?, Christian Perez, 05/31/2010
- Re: [charm] how to suspend/resume a chare ?, Christian Perez, 05/26/2010
- Re: [charm] how to suspend/resume a chare ?, Gengbin Zheng, 05/25/2010
- Re: [charm] how to suspend/resume a chare ?, Christian Perez, 05/25/2010
Archive powered by MHonArc 2.6.16.