charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Jozsef Bakosi <jbakosi AT lanl.gov>
- To: charm AT lists.cs.illinois.edu
- Subject: [charm] Restart from checkpoint files
- Date: Sat, 15 Jun 2019 05:58:19 -0600
- Authentication-results: illinois.edu; spf=pass smtp.mailfrom=jbakosi AT lanl.gov; dkim=pass header.d=lanl.gov header.s=lanl; dmarc=pass header.from=lanl.gov
Hi folks,
Restarting from checkpoint files I'm getting:
[0]CkRestartMain done. sending out callback.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: CmiFree reference count was zero-- is this a duplicate free?
[0] Stack Traceback:
[0:0] [0x76464e]
[0:1] [0x761898]
[0:2] [0x768b04]
[0:3] CkFreeMsg+0x28 [0x60a398]
[0:4] [0x5dab08]
[0:5] [0x76430a]
[0:6] [0x763c08]
[0:7] [0x5dc45e]
[0:8] [0x5d4a12]
[0:9] __libc_start_main+0xeb [0x7fffec65309b]
[0:10] [0x4d790a]
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 DUP
FROM 0 with errorcode 1
I tried the link option -memory paranoid and the runtime option ++debug,
as well as built Charm++ in debug mode with no luck getting more info.
build charm++ mpi-linux-x86_64 --enable-error-checking --with-prio-type=int
--enable-randomized-msgq --suffix randq-debug --build-shared -j36 -w
-stdlib=libc++ -g
How can I get more information what chare/group goes into error when
returning from checkpoint?
Thanks,
Jozsef
- [charm] Restart from checkpoint files, Jozsef Bakosi, 06/15/2019
- Re: [charm] Restart from checkpoint files, Mikida, Eric P, 06/21/2019
- Re: [charm] Restart from checkpoint files, Jozsef Bakosi, 06/24/2019
- Re: [charm] Restart from checkpoint files, Eric Mikida, 06/24/2019
- Re: [charm] Restart from checkpoint files, Jozsef Bakosi, 06/26/2019
- Re: [charm] Restart from checkpoint files, Eric Mikida, 06/24/2019
- Re: [charm] Restart from checkpoint files, Jozsef Bakosi, 06/24/2019
- Re: [charm] Restart from checkpoint files, Mikida, Eric P, 06/21/2019
Archive powered by MHonArc 2.6.19.