ppl-accel AT lists.siebelschool.illinois.edu

Subject: Ppl-accel mailing list

List archive

Re: [ppl-accel] 5/9 Accel Meeting Minutes

From: Lukasz Wesolowski <wesolwsk AT illinois.edu>
To: Ronak Buch <rabuch2 AT illinois.edu>
Cc: "ppl-accel AT cs.uiuc.edu" <ppl-accel AT cs.uiuc.edu>
Subject: Re: [ppl-accel] 5/9 Accel Meeting Minutes
Date: Mon, 19 May 2014 20:46:53 +0800
List-archive: <http://lists.cs.uiuc.edu/pipermail/ppl-accel/>
List-id: <ppl-accel.cs.uiuc.edu>

I believe we were planning to have a meeting/telecon today at 11 am. Let's have everyone send a quick update on progress since the last meeting. We will meet if there are significant new results or issues.

In particular, here are some of the action items from the last meeting:

1. OpenAtom GPU runs: profiling and experiments on larger data sets (Eric)

2. Projections/performance analysis of application runs on the Xeon Phi (Ronak and Michael)

3. Multiple ++ppn support

4. Heterogeneous load balancing: determining how well it is currently supported by Charm++ LB strategies

On my end, I looked at cuBLAS to see if it can be supported in GPU Manager. As Ronak mentioned last time, cuBLAS now allows specifying a CUDA stream in which the operations should complete. I noticed that there are custom cuBLAS functions for transferring data to and from the GPU, so support for that would have to be explicitly added in GPU Manager. Overall, adding cuBLAS support to GPU Manager looks doable.

Lukasz

On Sat, May 10, 2014 at 12:58 AM, Ronak Buch <rabuch2 AT illinois.edu> wrote:

Ronak, Eric M., Michael, Lukasz

Overview of various accelerator tools that exist in Charm++

Eric is using GPUs for OpenAtom, but testing was only on a very small data set; it's not clear if we are getting good performance since timing was not fine and input was small. Currently, it is using CuBLAS, so it uses synchronized kernel calls.

GPU Manager:
Lukasz is currently supporting, plans to write documentation for it, fix stability if issues arise
Task based library; instead of looking at GPU operations in isolation, group transfer to, computation, and transfer from as a single unit, and offload the whole thing.

Using it will stop the CPU from being idle while the GPU is working.
One key aspect is that it has its own memory pool for pinned memory used for GPU transfers. Otherwise, trying to alloc pinned memory while a kernel is executing will block.

Not sure if overlapping communication with computation has changed in more recent versions of CUDA
Lukasz thinks that Offload API would be the best solution for the Xeon Phi over GPU Manager.

Ronak and Michael worked on heterogeneous runs on with Xeon Phi, performance is rather slow.

We should take Dave's thesis work (AEMs) and see how useful it is for various applications. Also, take a look at G-Charm (according to Lukasz, their techniques are basically the same as Kunzman's) (seems to be no code available for G-Charm)

Sanjay's TODOs:

Read Dave Kunzman's thesis

Run Projections or other performance monitoring tools on Xeon Phi applications
Add multiple ++ppn (SMP) for Xeon Phi.

_______________________________________________
ppl-accel mailing list
ppl-accel AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/ppl-accel

[ppl-accel] 5/9 Accel Meeting Minutes, Ronak Buch, 05/09/2014
- Re: [ppl-accel] 5/9 Accel Meeting Minutes, Lukasz Wesolowski, 05/19/2014
  - Re: [ppl-accel] 5/9 Accel Meeting Minutes, Ronak Buch, 05/19/2014
    - Re: [ppl-accel] 5/9 Accel Meeting Minutes, Mikida, Eric P, 05/19/2014
    - Message not available
      - Re: [ppl-accel] 5/9 Accel Meeting Minutes, Michael Robson, 05/19/2014
        
        Re: [ppl-accel] 5/9 Accel Meeting Minutes, Mikida, Eric P, 05/19/2014
        
        Re: [ppl-accel] 5/9 Accel Meeting Minutes, Mikida, Eric P, 05/26/2014
        
        Message not available
        
        Re: [ppl-accel] 5/9 Accel Meeting Minutes, Michael Robson, 05/26/2014
- Message not available
  - Re: [ppl-accel] 5/9 Accel Meeting Minutes, Michael Robson, 05/19/2014