charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Kiril Dichev <K.Dichev AT qub.ac.uk>
- To: charm AT lists.cs.illinois.edu
- Subject: [charm] Fault tolerant Jacobi
- Date: Fri, 20 Jul 2018 16:15:44 +0100
- Authentication-results: illinois.edu; spf=pass smtp.mailfrom=prvs=07396b98a1=K.Dichev AT qub.ac.uk; dkim=pass header.d=qub.ac.uk header.s=qub-rsa; dmarc=none
Hello,
I am a new user of Charm++ and AMPI.
I’ve done some research on fault tolerance in MPI in the last year, and I see some nice ways to couple it with AMPI (happy to explain if anyone is interested). I used a Jacobi solver before, so it would be nice to use the same for AMPI to get going. I am especially interested to test the parallel recovery capabilities that were presented in work such as this one, for Jacobi among other codes: https://repositoriotec.tec.ac.cr/bitstream/handle/2238/7150/Using%20Migratable%20Objects%20to%20Enhance%20Fault%20Tolerance%20Schemes%20in%20Supercomputers.pdf?sequence=1&isAllowed=y
However, I am not sure where to begin. I pulled the official Charm++ repo, which contains some MPI Jacobi code in tests/ampi/jacobi3d. In particular, it has some kill files as well, which a very old tutorial tells me can be used to specify failure scenarios for PEs. However, it seems the +pkill_file option doesn’t even exist anymore, so that’s outdated, and I don’t know if the code is up-to-date either.
On the other hand, there is a repo here, according to the documentation in the main repo:
… which I can’t access, and apparently it also has Jacobi codes I can run with AMPI. Maybe that is the one I need? If it is, can I use it if I’m not affiliated with any US institutions?
Any help which is the up-to-date Jacobi + AMPI would be much appreciated. In addition, any help how to experiment with parallel recovery via migration would be great.
Regards,
Kiril Dichev
- [charm] Fault tolerant Jacobi, Kiril Dichev, 07/20/2018
- <Possible follow-up(s)>
- Re: [charm] Fault tolerant Jacobi, Sam White, 07/20/2018
- Re: [charm] Fault tolerant Jacobi, Kiril Dichev, 07/23/2018
Archive powered by MHonArc 2.6.19.