charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Bilge Acun <acun2 AT illinois.edu>
- To: Jozsef Bakosi <jbakosi AT gmail.com>
- Cc: Abhinav Bhatele <bhatele AT illinoisalumni.org>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster
- Date: Tue, 17 Feb 2015 14:06:22 -0600
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
Hi Jozef,
--
For Qlogic hardware, QLOGIC macro needs to be enabled when building Charm++.
Can you try building Charm++ again with adding -DQLOGIC option?
Thanks,
Bilge Acun
PhD Candidate at University of Illinois at Urbana-Champaign
Computer Science Department
On 17 February 2015 at 09:33, Jozsef Bakosi <jbakosi AT gmail.com> wrote:
Thanks, Jim and Abhinav, this helps. However, this is what I get after building Charm++ with "net-linux-x86_64 ibverbs" and trying to run simplearrayhello:
$ ./charmrun +p32 ./hello ++mpiexecCharmrun> IBVERBS version of charmrunCharmrun> started all node programs in 2.129 seconds.------------- Processor 0 Exiting: Called CmiAbort ------------Reason: Failed to change qp state to RTS: you may need some device-specific parameters in machine-ibevrbs...(32x)...[0] Stack Traceback:[0:0] CmiAbort+0x40 [0x54bac0][0:1] initInfiOtherNodeData+0x168 [0x54bfd8][0:2] ConverseInit+0xe8a [0x5569fa][0:3] main+0x26 [0x4857e6][0:4] __libc_start_main+0xfd [0x2abbb434cd5d][0:5] [0x47ffd9]Fatal error on PE 0> Failed to change qp state to RTS: you may need some device-specific parameters in machine-ibevrbs
And here is what I get after building with "net-linux-x86_64 ibverbs smp":
$ ./charmrun +p32 ./hello ++mpiexecCharmrun> IBVERBS version of charmrunCharmrun> started all node programs in 0.856 seconds.Charmrun: error on request socket--Socket closed before recv.
Any other clue as to what I'm still missing?
Thanks,Jozsef
On Mon, Feb 16, 2015 at 8:57 PM, Abhinav Bhatele <bhatele AT illinoisalumni.org> wrote:
Hi Jozsef,
Please find some answers inline:
On Fri, Feb 13, 2015 at 8:19 AM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:
Hi folks,
I'm wondering what is the best way to run Charm++ applications on clusters with Infiniband interconnects. So far I've been successfully building and running my app using Charm++, built by the following command, which uses MPI:
./build AMPI mpi-linux-x86_64 mpicxx
But now I'm wondering if the "ibverbs" build option provides better performance on Infiniband clusters. We have Qlogic and Mellanox Infiniband Fat-Tree interconnets. To experiment with this, I have successfully built Charm++ using the following command:
./build AMPI net-linux-x86_64 ibverbs
But when I try to run net-linux-x86_64-ibverbs/tests/charm++/simplearrayhello on two compute nodes, I get
$ ./charmrun +p32 ./helloCharmrun> IBVERBS version of charmrunmcmd: connect failed: Connection refused (32x)Charmrun> Error 1 returned from rsh (localhost:0)
So my questions are:
1. Can I expect better performance on Infiniband clusters using build options other than MPI?
Yes, typically you would expect the ibverbs build to perform better than the MPI build. You can try the four builds below:
mpi-linux-x86_64 mpicxx
mpi-linux-x86_64 mpicxx smp
net-linux-x86_64 ibverbs
net-linux-x86_64 ibverbs smp
2. Do I also have to contact our system admins to allow access to lower (than MPI) level software layers for the interconnect so Charm++ code (I assume ibverbs) can use it?
No, like Jim pointed out you can use ++mpiexec or manually specify the nodelist that has been allocated to you:3. Am I missing something else?4. Are the best ways to build Charm++ for specific hardware documented somewhere?
Hopefully, someone else will answer this but my guess is no.
Thanks in advance, and please let me know if I you need more information on the clusters.
Jozsef
_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm
_______________________________________________
ppl mailing list
ppl AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/ppl
--
Abhinav Bhatele, people.llnl.gov/bhatele
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory
Bilge Acun
PhD Candidate at University of Illinois at Urbana-Champaign
Computer Science Department
- [charm] Best way to run Charm++ apps on an Infiniband cluster, Jozsef Bakosi, 02/13/2015
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Jim Phillips, 02/16/2015
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Abhinav Bhatele, 02/16/2015
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Jozsef Bakosi, 02/17/2015
- Message not available
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Bilge Acun, 02/17/2015
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Jim Phillips, 02/17/2015
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Eric Bohm, 02/17/2015
- Message not available
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Bilge Acun, 02/17/2015
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Bilge Acun, 02/17/2015
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Jozsef Bakosi, 02/17/2015
- Message not available
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Bilge Acun, 02/17/2015
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Jim Phillips, 02/17/2015
- Re: [charm] [ppl] Best way to run Charm++ apps on an Infiniband cluster, Bilge Acun, 02/17/2015
Archive powered by MHonArc 2.6.16.