charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Shad Kirmani <sxk5292 AT cse.psu.edu>
- To: charm AT cs.uiuc.edu, Jason Holmes <jholmes AT psu.edu>, Padma Raghavan <raghavan AT cse.psu.edu>
- Subject: [charm] Fwd: backtrace of ChaNGa process
- Date: Mon, 26 Mar 2012 13:22:32 -0400
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
Hello,
Sometimes at startup of ChaNGa compiled for ibverbs, the processes will hang for a long period of time at the beginning of the job. A backtrace of a process looks like this:
#0 0x00000038daa0b795 in pthread_spin_lock () from /lib64/libpthread.so.0
#1 0x00002b93ecee7a7b in ibv_cmd_create_qp ()
from /usr/lib64/libmlx4-rdmav2.so
#2 0x000000000061add0 in recvBarrierMessage ()
#3 0x000000000061b882 in CmiBarrier ()
#4 0x00000000006206ec in CmiTimerInit ()
#5 0x00000000006216ec in ConverseCommonInit ()
#6 0x000000000061d723 in ConverseInit ()
#7 0x00000000005afd4c in main ()
With the verbose flag added to charmrun, the hang occurs right after it says that all nodes are connected:
...
Charmrun> Waiting for 62-th client to connect.
Charmrun> Waiting for 63-th client to connect.
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
We did not see these hangs when ChaNGa was compiled for MPI-linux-x86_64 instead of net-linux-x86_64 with ibverbs. When the hang occurs, it can either go away after a period of time and the job runs or it just hangs long enough that we give up and kill it.
This is on a RedHat Enterprise Linux 5 system using libibverbs-1.1.3-2.
Thanks,
Shad
Sometimes at startup of ChaNGa compiled for ibverbs, the processes will hang for a long period of time at the beginning of the job. A backtrace of a process looks like this:
#0 0x00000038daa0b795 in pthread_spin_lock () from /lib64/libpthread.so.0
#1 0x00002b93ecee7a7b in ibv_cmd_create_qp ()
from /usr/lib64/libmlx4-rdmav2.so
#2 0x000000000061add0 in recvBarrierMessage ()
#3 0x000000000061b882 in CmiBarrier ()
#4 0x00000000006206ec in CmiTimerInit ()
#5 0x00000000006216ec in ConverseCommonInit ()
#6 0x000000000061d723 in ConverseInit ()
#7 0x00000000005afd4c in main ()
With the verbose flag added to charmrun, the hang occurs right after it says that all nodes are connected:
...
Charmrun> Waiting for 62-th client to connect.
Charmrun> Waiting for 63-th client to connect.
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
We did not see these hangs when ChaNGa was compiled for MPI-linux-x86_64 instead of net-linux-x86_64 with ibverbs. When the hang occurs, it can either go away after a period of time and the job runs or it just hangs long enough that we give up and kill it.
This is on a RedHat Enterprise Linux 5 system using libibverbs-1.1.3-2.
Thanks,
Shad
- [charm] Fwd: backtrace of ChaNGa process, Shad Kirmani, 03/26/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Pritish Jetley, 03/26/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Shad Kirmani, 03/26/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Pritish Jetley, 03/26/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Shad Kirmani, 03/27/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Phil Miller, 03/27/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Shad Kirmani, 03/27/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Pritish Jetley, 03/27/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Shad Kirmani, 03/27/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Shad Kirmani, 03/27/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Pritish Jetley, 03/27/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Shad Kirmani, 03/27/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Phil Miller, 03/27/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Shad Kirmani, 03/27/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Pritish Jetley, 03/26/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Shad Kirmani, 03/26/2012
- Re: [charm] [ppl] Fwd: backtrace of ChaNGa process, Pritish Jetley, 03/26/2012
Archive powered by MHonArc 2.6.16.