charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: <alberto.ortiz09 AT gmail.com>
- To: charm AT lists.cs.illinois.edu
- Subject: [charm] Fault Tolerance
- Date: Thu, 16 Feb 2017 11:40:46 -0600
Hi,
I am using AMPI on a Zynq-cluster, having each Zynq a dual-core ARM. Currently
I am using 3 MicroZed boards (each one has a Zynq device). I was interested in
using AMPI from the start instead of using OpenMPI since it provides the user
with fault tolerance, adaptability and resilience.
The problem I have is that I don't know how to use or activate its fault
tolerance. I am programing in C using the MPI language and compiling the
programs with ampicc. The fault tolerance test I would like to try is to have
the 3 devices runing a task and reboot or plug off one of them, expecting AMPI
to redistribute the threads that were started in the unplugged device to the
working devices. I don't know if this kind of fault tolerance is implemented
nor how to take advantage of or use the fault tolerance implemented.
Another thing I would like to ask is if AMPI has support for run-time load
balancing. For example, if I were to multiply 10 big matrices and one node
ended its task before others, how can I implement the run-time load balance in
order to load the node with more work taken from other overloaded nodes?
Thank you in advance for the continuous support,
Alberto.
- [charm] Fault Tolerance, alberto.ortiz09, 02/16/2017
Archive powered by MHonArc 2.6.19.