charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: "Brunner, Robert Kraemer" <rbrunner AT illinois.edu>
- To: "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
- Subject: [charm] Charm/Converse fault tolerance on BW?
- Date: Thu, 17 Mar 2016 15:18:37 +0000
- Accept-language: en-US
Hi,
What is the state of fault tolerance support on Cray XE systems (in
particular, Blue Waters) with respect to allowing user code to catch node
failures. Can the runtime notify the user program that a node has failed, and
allow the user program to handle the failure, and perhaps to keep running,
taking the loss of the node and any associated objects into account?
Robert
----------------------------------------------
Robert Brunner
Blue Waters Science and Engineering Applications Support
National Center for Supercomputing Applications
4006F NCSA Building, MC-257
1205 W Clark St
Urbana, IL 61801
217-333-7677
rbrunner AT illinois.edu
- [charm] Charm/Converse fault tolerance on BW?, Brunner, Robert Kraemer, 03/17/2016
- Re: [charm] Charm/Converse fault tolerance on BW?, Phil Miller, 03/17/2016
- <Possible follow-up(s)>
- Re: [charm] Charm/Converse fault tolerance on BW?, Xiang Ni, 03/17/2016
Archive powered by MHonArc 2.6.16.