charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Scott Field <sfield AT astro.cornell.edu>
- To: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: [charm] memory management errors after ckexit called
- Date: Tue, 16 Jun 2015 16:14:23 -0400
- List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
- List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
Hi,
Recently, after pulling a bleeding-edge version of the charm++ code, all of our regression tests now fail with either a segmentation fault or "double free or corruption (!prev): 0x0000000001c4de20 ***". The error appears to occur after ckexit is called. Charm++ was built on my laptop with
>>> ./build charm++ multicore-linux32 gcc --with-production -j3 -std=c++11
Using git's bisect utility, I was able to track down the first commit version where things go wrong. The git hash and commit messages are c96750026bbc7a9190f1381e7ac9ea56ae86f80e and "Bug #695: disable comm thread in multicore builds". More specifically, if I edit line 200 of the file src/arch/util/machine-common-core.c from "#define CMK_SMP_NO_COMMTHD CMK_MULTICORE" to "#define CMK_SMP_NO_COMMTHD 0" the error message goes away and all tests pass again.
Honestly I don't really know what why this change fixed the problem -- its pretty far under-the-hood.
A few questions:
1) Is this list a appropriate place to post information about potential bugs?
2) Does this seem to be a charm++ bug introduced by that commit? Or a fix which has simply broken our code? I had a hard time tracking down the source of the error. Oddly enough, I could not reproduce the same error when using valgrind (although it did report an "Uninitialised value was created by a stack allocation" which it tracked to one of the declaration files created by charmc). With MALLOC_CHECK_ set to 3 I get the following
*** Error in `./Evolve1DScalarWave': free(): invalid pointer: 0x000000000203c920 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7338f)[0x7f4cebc2e38f]
/lib/x86_64-linux-gnu/libc.so.6(+0x81fb6)[0x7f4cebc3cfb6]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c280)[0x7f4cebbf7280]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c2a5)[0x7f4cebbf72a5]
./Evolve1DScalarWave[0x670b4a]
./Evolve1DScalarWave[0x5e39ed]
./Evolve1DScalarWave(CsdScheduleForever+0x48)[0x673e88]
./Evolve1DScalarWave(CsdScheduler+0x2d)[0x67413d]
./Evolve1DScalarWave(_ZN12ElementChareI16ScalarWaveSystemILi1EEE11endTimeStepEv+0x448)[0x580d3c]
./Evolve1DScalarWave(_ZN12ElementChareI16ScalarWaveSystemILi1EEE13endComputeRhsEv+0x5331DScalarWave': free(): invalid pointer: 0x000000000203c920 ***
Best,
Scott
- [charm] memory management errors after ckexit called, Scott Field, 06/16/2015
- Re: [charm] memory management errors after ckexit called, Phil Miller, 06/16/2015
- Re: [charm] memory management errors after ckexit called, Scott Field, 06/17/2015
- Re: [charm] memory management errors after ckexit called, Phil Miller, 06/19/2015
- Re: [charm] memory management errors after ckexit called, Phil Miller, 06/20/2015
- Re: [charm] memory management errors after ckexit called, Scott Field, 06/21/2015
- Re: [charm] memory management errors after ckexit called, Phil Miller, 06/21/2015
- Re: [charm] memory management errors after ckexit called, Scott Field, 06/21/2015
- Re: [charm] memory management errors after ckexit called, Phil Miller, 06/20/2015
- Re: [charm] memory management errors after ckexit called, Phil Miller, 06/19/2015
- Re: [charm] memory management errors after ckexit called, Scott Field, 06/17/2015
- Re: [charm] memory management errors after ckexit called, Phil Miller, 06/16/2015
Archive powered by MHonArc 2.6.16.