Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] NAMD Charmrun error on Ranger

charm AT lists.siebelschool.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] NAMD Charmrun error on Ranger


Chronological Thread 
  • From: Jim Phillips <jim AT ks.uiuc.edu>
  • To: Aditya Devarakonda <aditya08 AT cac.rutgers.edu>
  • Cc: Eric Bohm <ebohm AT illinois.edu>, Phil Miller <mille121 AT illinois.edu>, Abhishek Gupta <gupta59 AT illinois.edu>, Charm Mailing List <charm AT cs.illinois.edu>
  • Subject: Re: [charm] [ppl] NAMD Charmrun error on Ranger
  • Date: Thu, 17 May 2012 10:06:23 -0500 (CDT)
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>


The older runbatch scripts for ibverbs (2.7 maybe?) do this as they predate the mpiexec launch option.

-Jim

On Wed, 16 May 2012, Aditya Devarakonda wrote:

For the time being, it might be better for our group to give the SMP
build a try.

If I understand the nodelist file with charmrun correctly - we would
need some support from TACC where the machinefile generated implicitly
for ibrun should be available for charmrun. Is that accurate?

On Wed, 2012-05-16 at 15:23 -0500, Phil Miller wrote:
There's also using charmrun's own process launching mechanisms instead
of the system's mpiexec, in order to get its more scalable tree
structure with ++hierarchical-start. The downside is that this
requires a nodelist file for charmrun to work with. Given that there
is a common NAMD launching script that many users can reference, I
don't think that's a big deal, since the logic only needs to be
implemented once.

It also looks like that option never got documented in the usage manual:
http://charm.cs.illinois.edu/manuals/html/install/4_1.html
For that matter, neither is ++scalable-start (one SSH connection per node).

On Wed, May 16, 2012 at 3:11 PM, Eric Bohm
<ebohm AT illinois.edu>
wrote:
There is a P^2 startup and memory issue with the reliable channel
implementation on IBVERBS.

A simple way to reduce its impact is to use the SMP build, one can then
reduce the number of necessary processes to one per node by running +p
numnodes +ppn 15 to have 15 worker threads per node multiplex across one
communication thread per node. You then have (P/16)^2, which will scale
much farther.

On 05/16/2012 09:46 AM, Jim Phillips wrote:
I think the mpiexec calls the ibrun script, which calls the real mpiexec.

-Jim


On Wed, 16 May 2012, Aditya Devarakonda wrote:

Thanks Jim,

So, the pre-loaded NAMD batch scripts on Ranger seem to use Charm with
the mpiexec option. Now, is there a better way of doing this (through
ibrun perhaps).

Maybe I'm wrong, but my thinking with adjusting the timeout is that the
problem could always creep back as we increase the number of nodes.

Do you guys typically use mpiexec to start the NAMD processes on Ranger?

Regards,
Aditya

On Mon, 2012-05-14 at 09:58 -0500, Jim Phillips wrote:

Charmrun should have some options for adjusting the timeout. One goal of
using mpiexec was to make this process more similar to other jobs on the
machine so the timeout may just need to be extended (I'm not sure what the
default is - that should probably be printed).

-Jim

On Sat, 12 May 2012, Aditya Devarakonda wrote:

Hi everyone,

Hope you guys are doing well. Our group has been working with NAMD for the
past couple of months and recently started running jobs on Ranger.

We have been seeing some issues while running at 1K or more processors. It
seems to be an issue with launching NAMD on remote nodes - we get the
following error:

Charmrun> error 64 attaching to node:
Timeout waiting for node-program to connect

We're using the NAMD_2.8_Linux-x86_64-ibverbs-Ranger build available on
Ranger and using mpiexec

charmrun +p ++mpiexec ++remote-shell mympiexec ++runscript tacc_affinity
namd2 $CONFFILE

We were able to run successfully scale up to 512 processors but not beyond.
Any ideas?

Thanks,
Aditya



_______________________________________________
ppl mailing list
ppl AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/ppl

_______________________________________________
ppl mailing list
ppl AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/ppl






Archive powered by MHonArc 2.6.16.

Top of Page