charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Phil Miller <phil AT hpccharm.com>
- To: charm <charm AT lists.cs.illinois.edu>
- Subject: [charm] Charm++ 6.8.0 Beta Release
- Date: Thu, 6 Apr 2017 16:25:19 -0500
- Calls to entry methods taking a single fixed-size parameter can now
automatically be aggregated and routed through the TRAM library by
marking them with the [aggregate] attribute.
- Calls to parameter-marshalled entry methods with large array
arguments can ask for asynchronous zero-copy send behavior with an
`rdma' tag in the parameter's declaration.
- The runtime system now integrates an OpenMP runtime library so that
code using OpenMP parallelism will dispatch work to idle worker
threads within the Charm++ process.
- Applications can ask the runtime system to perform automatic
high-level end-of-run performance analysis by linking with the
`-tracemode perfReport' option.
- Added a new dynamic remapping/load-balancing strategy,
GreedyRefineLB, that offers high result quality and well bounded
execution time.
- Improved and expanded topology-aware spanning tree generation
strategies, including support for runs on a torus with holes, such
as Blue Waters and other Cray XE/XK systems.
- Charm++ programs can now define their own main() function, rather
than using a generated implementation from a mainmodule/mainchare
combination. This extends the existing Charm++/MPI interoperation
feature.
- GPU manager now creates one instance per OS process and scales the
pre-allocated memory pool size according to the GPU memory size and
number of GPU manager instances on a physical node.
- Several GPU Manager API changes including:
* Replaced references to global variables in the GPU manager API with calls to
functions.
* The user is no longer required to specify a bufferID in dataInfo struct.
* Replaced calls to kernelSelect with direct invocation of functions passed
via the work request object (allows CUDA to be built with all programs).
- Added support for malleable jobs that can dynamically shrink and
expand the set of compute nodes hosting Charm++ processes.
- Greatly expanded and improved reduction operations:
* Added built-in reductions for all logical and bitwise operations
on integer and boolean input.
* Reductions over groups and chare arrays that apply commutative,
associative operations (e.g. MIN, MAX, SUM, AND, OR, XOR) are now
processed in a streaming fashion. This reduces the memory footprint of
reductions. User-defined reductions can opt into this mode as well.
* Added a new `Tuple' reducer that allows combining multiple reductions
of different input data and operations from a common set of source
objects to a single target callback.
* Added a new `Summary Statistics' reducer that provides count, mean,
and standard deviation using a numerically-stable streaming algorithm.
- Added a `++quiet' option to suppress charmrun and charm++ non-error
messages at startup.
- Calls to chare array element entry methods with the [inline] tag now
avoid copying their arguments when the called method takes its
parameters by const&, offering a substantial reduction in overhead in
those cases.
- Synchronous entry methods that block until completion (marked with
the [sync] attribute) can now return any type that defines a PUP
method, rather than only message types.
- [charm] Charm++ 6.8.0 Beta Release, Phil Miller, 04/06/2017
Archive powered by MHonArc 2.6.19.