charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: Nitin Bhat <nitin AT hpccharm.com>
- To: "Ortega, Bob" <bobo AT mail.smu.edu>
- Cc: "charm AT cs.illinois.edu" <charm AT cs.illinois.edu>
- Subject: Re: [charm] Charm++/Converse library build errors
- Date: Mon, 26 Oct 2020 16:14:38 -0500
- Authentication-results: illinois.edu; spf=none smtp.mailfrom=nitin AT hpccharm.com; dkim=pass header.s=20150623 header.d=hpccharm-com.20150623.gappssmtp.com; dmarc=none
Hi Bob,
Thanks for your email.
Can you confirm your UCX version by sending us the output of 'ucx_info -v’?
I think you’re correct in your assessment. When I tried with an older ucx version (1.3), I was able to replicate your issue. However, starting from ucx 1.4 onwards, those functions (ucp_get_nb and ucp_put_nb) become a part of the UCX API and charm builds correctly with ucx backend. If it’s available, could you try upgrading to hpcx 2.2 and above. Separately, you can directly use UCX (and Open MPI) separately.
It’s also important to note that there was a known bug (hang) with UCX versions before 1.9. Although it was so far reproduced on only one machine (Frontera @ TACC), it did affect a few specific runs of NAMD, and hence it could affect your runs as well. Since the recently released UCX version 1.9 solves that bug, I recommend that you use that version. It looks hpcx 2.7 ships with that version of UCX.
Additionally, when you use a UCX build, in order to make sure that charm picks up the ucx libraries from the right place, make sure that the path to the ucx build directory is passed to the charm build command using ‘—basedir=<ucx-base-dir>’. Additionally, you would also need to have <ucx-base-dir>/lib in your LD_LIBRARY_PATH when you run the compiled binary.
Another comment/question is about the charm build. Is there a reason you’re not using the SMP mode? For larger runs, that mode shows significantly improved performance over the non-SMP mode (your current build command).
Let us know if you have any additional questions.
Thanks,
Nitin
On Oct 23, 2020, at 9:14 AM, Ortega, Bob <bobo AT mail.smu.edu> wrote:Following the NAMD 2.14 Release Notes, I am attempting to build and test the Charm++/Converse library, Infiniband UCX OpenMPI PMIx version.Using the following command produces errors,./build charm++ ucx-linux=x86_64 icc ompipmix –with-productionResulting in,Performing '/usr/bin/gmake charm++ OPTS=-optimize -production QUIET=' in ucx-linux-x86_64-ompipmix-icc/tmp/usr/bin/gmake -C libs/ck-libs/completiongmake[1]: Entering directory `/users/bobo/NAMD/NAMD_2.14_Source/charm-6.10.2/ucx-linux-x86_64-ompipmix-icc/tmp/libs/ck-libs/completion'gmake[1]: Nothing to be done for `all'.gmake[1]: Leaving directory `/users/bobo/NAMD/NAMD_2.14_Source/charm-6.10.2/ucx-linux-x86_64-ompipmix-icc/tmp/libs/ck-libs/completion'SRCBASE=../../src ./commitid.shDev mode../bin/charmc -optimize -production -I. -o machine.o machine.CIn file included from machine.C(719):machine-onesided.C(125): error: identifier "ucp_put_nb" is undefinedstatusReq = ucp_put_nb(ep, ncpyOpInfo->srcPtr,^In file included from machine.C(719):machine-onesided.C(136): error: identifier "ucp_get_nb" is undefinedstatusReq = ucp_get_nb(ep, (void*)ncpyOpInfo->destPtr,^compilation aborted for machine.C (code 2)Fatal Error by charmc in directory /users/bobo/NAMD/NAMD_2.14_Source/charm-6.10.2/ucx-linux-x86_64-ompipmix-icc/tmpCommand icpc -fpic -DCMK_GFORTRAN -I../bin/../include -I/usr/include/ -I./proc_management/ -I./proc_management/simple_pmi/ -D__CHARMC__=1 -I. -O2 -fno-stack-protector -c machine.C -o machine.o returned error code 2charmc exiting...gmake: *** [machine.o] Error 1-------------------------------------------------Charm++ NOT BUILT. Either cd into ucx-linux-x86_64-ompipmix-icc/tmp and tryto resolve the problems yourself, visitfor more information. Otherwise, email the developers at charm AT cs.illinois.eduI have consulted with Mellanox thinking this might be an hpcx toolkit related problem. Their assessment is that it is not a Mellanox issue, but perhaps an hpcx version issue. We are using version 2.1 hpcx and Mellanox notes that ucp_put_nb and ucp_get_nb exists in newer HPCX/UCX version, 2.2 or higher.Would you agree that this could be the issue or is there another way to resolve errors?Thank you,Bob
- [charm] Charm++/Converse library build errors, Ortega, Bob, 10/23/2020
- Re: [charm] Charm++/Converse library build errors, Nitin Bhat, 10/26/2020
- Re: [charm] Charm++/Converse library build errors, Ortega, Bob, 10/26/2020
- Re: [charm] Charm++/Converse library build errors, Nitin Bhat, 10/29/2020
- Re: [charm] Charm++/Converse library build errors, Ortega, Bob, 10/26/2020
- Re: [charm] Charm++/Converse library build errors, Nitin Bhat, 10/26/2020
Archive powered by MHonArc 2.6.19.