charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: "Kale, Laxmikant V" <kale AT illinois.edu>
- To: "Wang, Felix Y." <wang65 AT llnl.gov>, "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
- Subject: Re: [charm] Fault Tolerance Documentation
- Date: Wed, 25 Jul 2012 12:19:25 +0000
- Accept-language: en-US
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
Thanks for you comments, Felix. We are in the process of updating the manuals (and, hopefully soon, writing a tutorial), and we will keep your comments in mind. For now, I would like to know: There is the Charm++ manual online (at http://charm.cs.illinois.edu/manuals/
It is the one titled Charm++ language) . Section 6 is about fault tolerance. Was the info in there inadequate or incorrect? We plan to add at least one example program for checkpoint anyway. But it will beuseful to know specific deficiencies of that manual
section as we are working on improving it.
--
Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu
Professor, Computer Science kale AT illinois.edu
201 N. Goodwin Avenue Ph: (217) 244-0094
Urbana, IL 61801-2302 FAX: (217) 265-6582
On 7/24/12 5:28 PM, "Wang, Felix Y." <wang65 AT llnl.gov> wrote:
Hello PPL,
I'm an intern at LLNL over the summer, and I've been working on a code port of the LULESH proxy application to Charm++ and have started to put in some constructs for fault tolerance (checkpoints/restarts) these past few days. Unfortunately, the documentation that is generally available online is rather sparse, and it does not point to any good examples for checkpointing and restarting as it is actually used in a program. Fortunately, I've been able to meet with Xiang to discuss what to actually do the implementation, and she was able to point me to some example code as well as how to build Charm++ to incorporate these constructs in the first place, among other necessary items.
Please take this email as a request to provide a more comprehensive manual section on the fault tolerance aspects of Charm++. A section/link to a tutorial, such as with the PUPers, would also be helpful.
Thanks,
--- Felix
- [charm] Fault Tolerance Documentation, Wang, Felix Y., 07/24/2012
- Re: [charm] Fault Tolerance Documentation, Kale, Laxmikant V, 07/25/2012
Archive powered by MHonArc 2.6.16.