charm AT lists.siebelschool.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] Charm++ 6.8.0 Beta Release

From: Phil Miller <phil AT hpccharm.com>
To: charm <charm AT lists.cs.illinois.edu>
Subject: [charm] Charm++ 6.8.0 Beta Release
Date: Thu, 6 Apr 2017 16:25:19 -0500

Hello everyone!

On behalf of the Charm++ development team and contributors, I'm pleased to announce a beta release of Charm++ in advance of the upcoming version 6.8.0. We ask that users take this opportunity to test the latest code with their applications and report any issues encountered.

The code for this release can be obtained by

git clone https://charm.cs.illinois.edu/gerrit/charm.git

git checkout v6.8.0-beta2

(Beta 1 was not announced due to bugs found in internal testing)

Corresponding updated Java binaries of Projections and CharmDebug can be downloaded from the following links:

https://charm.cs.illinois.edu/distrib/binaries/projections/projections_6.8.0-beta1.tar.gz

https://charm.cs.illinois.edu/distrib/binaries/charmdebug/charmdebug_6.8.0.tar.gz

Among over 700 commits made since the release of version 6.7.1, some of the larger and more exciting improvements in the system include

- Calls to entry methods taking a single fixed-size parameter can now

automatically be aggregated and routed through the TRAM library by

marking them with the [aggregate] attribute.

- Calls to parameter-marshalled entry methods with large array

arguments can ask for asynchronous zero-copy send behavior with an

`rdma' tag in the parameter's declaration.

- The runtime system now integrates an OpenMP runtime library so that

code using OpenMP parallelism will dispatch work to idle worker

threads within the Charm++ process.

- Applications can ask the runtime system to perform automatic

high-level end-of-run performance analysis by linking with the

`-tracemode perfReport' option.

- Added a new dynamic remapping/load-balancing strategy,

GreedyRefineLB, that offers high result quality and well bounded

execution time.

- Improved and expanded topology-aware spanning tree generation

strategies, including support for runs on a torus with holes, such

as Blue Waters and other Cray XE/XK systems.

- Charm++ programs can now define their own main() function, rather

than using a generated implementation from a mainmodule/mainchare

combination. This extends the existing Charm++/MPI interoperation

feature.

- GPU manager now creates one instance per OS process and scales the

pre-allocated memory pool size according to the GPU memory size and

number of GPU manager instances on a physical node.

- Several GPU Manager API changes including:

* Replaced references to global variables in the GPU manager API with calls to

functions.

* The user is no longer required to specify a bufferID in dataInfo struct.

* Replaced calls to kernelSelect with direct invocation of functions passed

via the work request object (allows CUDA to be built with all programs).

- Added support for malleable jobs that can dynamically shrink and

expand the set of compute nodes hosting Charm++ processes.

- Greatly expanded and improved reduction operations:

* Added built-in reductions for all logical and bitwise operations

on integer and boolean input.

* Reductions over groups and chare arrays that apply commutative,

associative operations (e.g. MIN, MAX, SUM, AND, OR, XOR) are now

processed in a streaming fashion. This reduces the memory footprint of

reductions. User-defined reductions can opt into this mode as well.

* Added a new `Tuple' reducer that allows combining multiple reductions

of different input data and operations from a common set of source

objects to a single target callback.

* Added a new `Summary Statistics' reducer that provides count, mean,

and standard deviation using a numerically-stable streaming algorithm.

- Added a `++quiet' option to suppress charmrun and charm++ non-error

messages at startup.

- Calls to chare array element entry methods with the [inline] tag now

avoid copying their arguments when the called method takes its

parameters by const&, offering a substantial reduction in overhead in

those cases.

- Synchronous entry methods that block until completion (marked with

the [sync] attribute) can now return any type that defines a PUP

method, rather than only message types.

Future portability/compatibility note:

Please be aware that all feature releases of the Charm++ system following the final 6.8 will require full C++11 support from the compiler and standard library in use.

[charm] Charm++ 6.8.0 Beta Release, Phil Miller, 04/06/2017