charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Ted Packwood <malice AT cray.com>
- To: <charm AT lists.cs.illinois.edu>
- Subject: [charm] NAMD segfaulting in "setJcontext "
- Date: Wed, 31 Aug 2016 13:54:23 -0500
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=malice AT cray.com;
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
Hello- I'm trying to determine what is causing a failure in NAMD when built with the Cray compiler on a Cray XC30. The failure is in "setJcontext" as you can see from the traceback below. The charm++ build works fine with the include charm++ test "jacobi3d" (4ranks on 4 seperate Broadwell chips, +ppn6) I built charm++ with: ./build charm++ mpi-crayxc craycc smp NAMD was built with: ./config CRAY-XC-cce --charm-arch mpi-crayxc.cce-smp-craycc --with-fftw3 --without-tcl --charm-opts -save And was run with just one rank on a Broadwell chip. The intel compiler build of charm++ and NAMD works fine, so this appears to be an issue with the Cray compiler. I have a few questions: 1) Does anyone have an idea of what might cause this type of failure? 2) Any suggestions as to a possible solution, or build changes that might solve the problem? 3) Is there a simple charm++ test which mimics the Jcontext usage that NAMD requires that might cause a similar failure? I'd prefer to try to reproduce this with a smaller test than NAMD. :) 4) If not, should I contact the NAMD folks instead? Core was generated by `./namd2.XC30.IVB.kay.PE604.cce853-g-O0-flex_mp-strict.mpich743.libsci16091.fftw'. Program terminated with signal SIGABRT, Aborted. #0 0x00000000212c43eb in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37 37 ../nptl/sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory. (gdb) where #0 0x00000000212c43eb in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37 #1 0x0000000021417dc5 in abort () at abort.c:99 #2 0x0000000021173c72 in MPID_Abort () #3 0x0000000021123a75 in PMPI_Abort () #4 0x0000000020e75cd5 in LrtsAbort () at machine.c:1656 #5 <signal handler called> #6 0x0000000020e71089 in setJcontext () at uJcontext.c:131 #7 0x0000000020e71100 in swapJcontext () at uJcontext.c:176 #8 0x0000000020e713b5 in CthResume () at libthreads-default.c:1669 #9 0x0000000020e78388 in CsdScheduleForever () at convcore.c:1901 #10 0x0000000020e78299 in CsdScheduler () at convcore.c:1837 #11 0x00000000200dd891 in BackEnd::suspend () at src/BackEnd.C:285 #12 0x0000000020b6e650 in ScriptTcl::suspend (this=0x41bcb1b0) at src/ScriptTcl.C:72 #13 0x0000000020b6e6ff in ScriptTcl::initcheck (this=0x41bcb1b0) at src/ScriptTcl.C:104 #14 0x0000000020b6e577 in ScriptTcl::run (this=0x41bcb1b0, scriptFile=0x7fffffff765e) at src/ScriptTcl.C:2076 #15 0x00000000200d6d49 in after_backend_init (argc=2, argv=0x7fffffff6678) at src/mainfunc.C:158 #16 0x00000000200dcff9 in slave_init (argc=2, argv=0x7fffffff6678) at src/BackEnd.C:140 #17 0x0000000020e745ec in ConverseRunPE$$CFE_id_d7e6ac3e_9d711be8 () at machine-common-core.c:1293 #18 0x0000000020e71b9a in call_startfn$$CFE_id_d7e6ac3e_9d711be8 () at machine-smp.c:415 #19 0x0000000020eeda44 in start_thread (arg=0x2aaaaad45700) at pthread_create.c:309 #20 0x00000000214756f9 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 |
- [charm] NAMD segfaulting in "setJcontext ", Ted Packwood, 08/31/2016
- Re: [charm] NAMD segfaulting in "setJcontext ", Phil Miller, 08/31/2016
Archive powered by MHonArc 2.6.19.