charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew()
Chronological Thread
- From: Jozsef Bakosi <jbakosi AT gmail.com>
- To: Phil Miller <mille121 AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew()
- Date: Tue, 22 Nov 2016 22:12:20 -0700
On Tue, Nov 22, 2016 at 9:40 PM, Phil Miller <mille121 AT illinois.edu> wrote:
Some clarifying/exploratory questions (which should be pretty generally applicable):- Which system do you observe this on?
LANL's trinitite (a smaller version of trinity). This is a Cray XC40 with KNLs. I don't have all the details handy, but I suspect it is a very similar hw and sw stack to cori.
- What build configuration are you using on the Cray system? gni-crayxc? mpi-crayxc? smp or no? The full ./build command line would be useful, along with output from 'module list'
No smp. The build command and modules:
$ build charm++ mpi-crayxc --with-production -j40 -O3 -DNDEBUG
Currently Loaded Modulefiles:
1) modules/3.2.10.4 6) craype/2.5.6 11) pmi/5.0.10-1.0000.11050.0.0.ari 16) dvs/2.7_0.9.0-2.201 21) cmake/3.6.2
2) eswrap/2.0.6-2.9 7) cray-mpich/7.4.2 12) dmapp/7.1.0-12.37 17) alps/6.1.6-20.1 22) craype-hugepages8M
3) gcc/6.1.0 8) cray-libsci/16.07.1 13) gni-headers/5.0.7-3.1 18) rca/1.0.0-6.21 23) cray-netcdf-hdf5parallel/4.4.1
4) craype-haswell 9) udreg/2.3.2-4.6 14) xpmem/0.1-4.5 19) atp/2.0.2 24) cray-hdf5-parallel/1.10.0
5) craype-network-aries 10) ugni/6.0.12-2.1 15) job/1.5.5-3.58 20) PrgEnv-gnu/6.0.3
- Outside of valgrind, do you otherwise observe failures on the Cray?
No. Though I have only tried another application, my unittest harness, which is also Charm++ and parallel, but does not use reductions or groups.
- Can you reproduce this with a maximally simplified build on the Cray ? E.g. without smp, and on whichver network layer (gni or mpi) you're not currently using?
I'm using a non-smp cray-mpi build.
- How many nodes and PEs does this take to reproduce? How few can you use? Does it reproduce on just 1 PE?
I can reproduce it using a single core with charmrun +p1.
- Can you reproduce this in a smaller, simplified test program? Alternately, can you point us to the code in your repository and a set of inputs and command line arguments that reproduces it?
I will try to reproduce this tomorrow on a simpler example. I'll get back to you on this.
I have also noticed a similar poblem but this time producing an invalid write of size 8 instead of a read, attempting to contribute to not an array of doubles but an uint64_t using CkReduction::sum_int. I guess this latter with uint64_t is already a type-length problem, so I would expect that to be a problem everywhere, yet it is not, only on cray.
More tomorrow. Thanks, Phil.
On Tue, Nov 22, 2016 at 10:25 AM, Jozsef Bakosi <jbakosi AT gmail.com> wrote:Hi folks,I'm getting the following valgrind message only on Cray (no problem on, e.g., linux/mac):==48771== Use of uninitialised value of size 8==48771== at 0x21696E61: memcpy (memcpy.S:201)==48771== by 0x212E12E0: CkReductionMsg::buildNew(int, void const*, CkReduction::reducerType, CkReductionMsg*)==48771== by 0x212EF68A: Group::contribute(int, void const*, CkReduction::reducerType, CkCallback const&, unsigned short)This is from a chare group reduction of an array of doubles with CkReduction::sum_double. There is a single memcpy() in src/ck-core/ckreduction.C:1501:memcpy(ret->data,srcData,NdataSize);I am suspecting the memory size allocated behind srcData is smaller (by a single double) than NdataSize, probably because I'm feeding the wrong data size. The way I feed the data size to the contribute call is via static_cast<int>( vec.size() * sizeof(double) ), and the data pointer is vec.data(), which I assume ends up being passed on to be srcData. Here vec is a std::vector<double>. I believe, this should be correct, but for some reason this is only a segfault on cray - valgrind does not even complain on linux or mac.Does any have an idea how I can debug this?Thanks,Jozsef
- [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Jozsef Bakosi, 11/22/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Phil Miller, 11/22/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Jozsef Bakosi, 11/22/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Phil Miller, 11/22/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Jozsef Bakosi, 11/23/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Phil Miller, 11/23/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Jozsef Bakosi, 11/23/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Phil Miller, 11/23/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Jozsef Bakosi, 11/23/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Jozsef Bakosi, 11/23/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Phil Miller, 11/23/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Jozsef Bakosi, 11/23/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Phil Miller, 11/22/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Jozsef Bakosi, 11/22/2016
- Re: [charm] Use of uninitialised value of size 8 from CkReductionMsg::buildNew(), Phil Miller, 11/22/2016
Archive powered by MHonArc 2.6.19.