charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Jozsef Bakosi <jbakosi AT gmail.com>
- To: Steve Petruzza <spetruzza AT sci.utah.edu>
- Cc: charm <charm AT lists.cs.illinois.edu>
- Subject: Re: [charm] Issues trying to run mpi-coexist example
- Date: Wed, 22 Jun 2016 08:12:20 -0600
Hi Steve,
Charm++ developers please correct me if I'm wrong...
To interoperate with MPI codes or libraries you need to build Charm++ on top of the MPI backend. On Mac I do that with:
$ build charm++ mpi-darwin-x86_64
On linux, I do
$ build charm++ mpi-linux-x86_64 mpicxx
Jozsef
On Wed, Jun 22, 2016 at 5:13 AM, Steve Petruzza <spetruzza AT sci.utah.edu> wrote:
Hi,I am trying to use the MPI interoperability with Charm++, and I am starting using the example mpi-coexist.I tried to build on my Mac (openmpi + multicore-darwin-x86_64-clang or netlrts-darwin-x86_64-smp-clang) but I cannot use the:CharmLibInit(MPI_Comm userComm, int argc, char **argv);because CMK_CONVERSE_MPI is set to 0 in mpi-interoperate.hSo it just tried to use the other CharmLibInit passing a 0 as userComm, but on the call it just crashes:mpirun -np 4 ./multirun——————————————
[Steve:72383] *** Process received signal ***
[Steve:72383] Signal: Segmentation fault: 11 (11)
[Steve:72383] Signal code: Address not mapped (1)
[Steve:72383] Failing at address: 0x0
[Steve:72383] [ 0] 0 libsystem_platform.dylib 0x00007fff8b9b652a _sigtramp + 26
[Steve:72383] [ 1] 0 ??? 0x0000000000000000 0x0 + 0
[Steve:72383] [ 2] [Steve:72384] *** Process received signal ***
[Steve:72384] Signal: Segmentation fault: 11 (11)
[Steve:72384] Signal code: Address not mapped (1)
[Steve:72384] Failing at address: 0x0
[Steve:72384] [ 0] 0 libsystem_platform.dylib 0x00007fff8b9b652a _sigtramp + 26
[Steve:72384] [ 1] 0 ??? 0x0000000000000000 0x0 + 0
[Steve:72384] [ 2] [Steve:72385] *** Process received signal ***
[Steve:72385] Signal: Segmentation fault: 11 (11)
[Steve:72385] Signal code: Address not mapped (1)
[Steve:72385] Failing at address: 0x0
[Steve:72385] [ 0] 0 libsystem_platform.dylib 0x00007fff8b9b652a _sigtramp + 26
[Steve:72385] [ 1] 0 ??? 0x0000000000000000 0x0 + 0
[Steve:72385] [ 2] [Steve:72382] *** Process received signal ***
[Steve:72382] Signal: Segmentation fault: 11 (11)
[Steve:72382] Signal code: Address not mapped (1)
[Steve:72382] Failing at address: 0x0
[Steve:72382] [ 0] 0 libsystem_platform.dylib 0x00007fff8b9b652a _sigtramp + 26
[Steve:72382] [ 1] 0 ??? 0x0000000000000000 0x0 + 0
[Steve:72382] [ 2] 0 multirun 0x0000000100118106 CmiAbortHelper + 38
[Steve:72382] [ 3] 0 multirun 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
[Steve:72382] [ 4] 0 multirun 0x0000000100118106 CmiAbortHelper + 38
[Steve:72385] [ 3] 0 multirun 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
[Steve:72385] [ 4] 0 multirun 0x0000000100111fd4 CharmLibInit + 36
[Steve:72385] [ 5] 0 multirun 0x0000000100111fd4 CharmLibInit + 36
[Steve:72382] [ 5] 0 multirun 0x0000000100118106 CmiAbortHelper + 38
[Steve:72383] [ 3] 0 multirun 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
[Steve:72383] [ 4] 0 multirun 0x0000000100111fd4 CharmLibInit + 36
[Steve:72383] [ 5] 0 multirun 0x0000000100001633 main + 147
[Steve:72383] [ 6] 0 multirun 0x0000000100001574 start + 52
[Steve:72383] *** End of error message ***
0 multirun 0x0000000100118106 CmiAbortHelper + 38
[Steve:72384] [ 3] 0 multirun 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
[Steve:72384] [ 4] 0 multirun 0x0000000100111fd4 CharmLibInit + 36
[Steve:72384] [ 5] 0 multirun 0x0000000100001633 main + 147
[Steve:72384] [ 6] 0 multirun 0x0000000100001633 main + 147
[Steve:72385] [ 6] 0 multirun 0x0000000100001574 start + 52
[Steve:72385] *** End of error message ***
0 multirun 0x0000000100001633 main + 147
[Steve:72382] [ 6] 0 multirun 0x0000000100001574 start + 52
[Steve:72382] *** End of error message ***
0 multirun 0x0000000100001574 start + 52
[Steve:72384] *** End of error message ***
——————————————I guess this has to do with the build. How can I build charm++ and this example on Mac in order to use the correct CharmLibInit with MPI_Comm?Anyway I tried the same on a Cray XC40 node (built correctly using CharmLibInit with MPI_Comm), but:If I run:srun -N 1 -n 16 --hint=nomultithread --ntasks-per-socket=16 ./multirun——————————————Charm++> Running on Gemini (GNI) with 16 processes
Charm++> static SMSG
Charm++> SMSG memory: 79.0KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 2048K
Charm++> Running in SMP mode: numNodes 16, 1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.7.0-202-g95e5ac0
Warning> using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (64-way SMP).——————————————Here it hangs forever.Then if I run:srun -N 1 -n 16 --hint=nomultithread --ntasks-per-socket=16 ./multirun_time——————————————Charm++> Running on Gemini (GNI) with 16 processes
Charm++> static SMSG
Charm++> SMSG memory: 79.0KB
Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0 means no limit)
Charm++> memory pool registered memory limit: 200000MB, send limit: 100000MB
Charm++> only comm thread send/recv messages
Charm++> Cray TLB page size: 2048K
Charm++> Running in SMP mode: numNodes 16, 1 worker threads per process
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.7.0-202-g95e5ac0
Warning> using Isomalloc in SMP mode, you may need to run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (64-way SMP).
Running Hi on 16 processors for 10 elements
Hi[1] from element 0
Hi[2] from element 1
Hi[3] from element 2
Hi[4] from element 3
Hi[5] from element 4
Hi[6] from element 5
Hi[7] from element 6
Hi[8] from element 7
Hi[9] from element 8
Hi[10] from element 9——————————————Also here it hangs forever.
Is there any parameter or flag I should add? (I tried already -envs PAMI_CLIENTS=MPI,Converse without success)Thank you,Steve
- [charm] Issues trying to run mpi-coexist example, Steve Petruzza, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Jozsef Bakosi, 06/22/2016
- Message not available
- Re: [charm] Issues trying to run mpi-coexist example, Sam White, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Steve Petruzza, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Phil Miller, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Phil Miller, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Steve Petruzza, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Phil Miller, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Steve Petruzza, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Phil Miller, 06/23/2016
- Re: [charm] Issues trying to run mpi-coexist example, Steve Petruzza, 06/23/2016
- Re: [charm] Issues trying to run mpi-coexist example, Phil Miller, 06/23/2016
- Re: [charm] Issues trying to run mpi-coexist example, Steve Petruzza, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Phil Miller, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Phil Miller, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Steve Petruzza, 06/22/2016
- Re: [charm] Issues trying to run mpi-coexist example, Sam White, 06/22/2016
Archive powered by MHonArc 2.6.16.