charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: "Van Der Wijngaart, Rob F" <rob.f.van.der.wijngaart AT intel.com>
- To: Sam White <white67 AT illinois.edu>
- Cc: "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
- Subject: [charm] Adaptive MPI
- Date: Wed, 23 Nov 2016 08:19:33 +0000
- Accept-language: en-US
Hello Team,
I am trying to troubleshoot my Adaptive MPI code that uses dynamic load balancing. It crashes with a segmentation fault in AMPI_Migrate. I checked and dchunkpup (which I supplied) is called within AMPI_Migrate and finishes on all ranks. That is not to say it is correct, but the crash is not happening there. It could have corrupted memory elsewhere, though, so I gutted it, such that it only asks for and prints the MPI rank of the ranks entering it. I added graceful exit code after the call to AMPI_Migrate, But that is evidently not reached. I understand that this information is not enough for you to identify the problem, but at present I don’t know where to start, since the error occurs in code that I did not write. Could you give me some pointers where to start? Thanks! Below is some relevant output. If I replace the RotateLB load balancer with RefineLB, some ranks do pass the AMPI_Migrate call, but that is evidently because the load balancer left them alone.
Rob
rfvander@klondike:~/esg-prk-devel/AMPI/AMR$ make clean; make amr USE_PUPER=1 rm -f amr.o MPI_bail_out.o wtime.o amr *.optrpt *~ charmrun stats.json amr.decl.h amr.def.h /opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1 -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c amr.c In file included from amr.c:66:0: ../../include/par-res-kern_general.h: In function ‘prk_malloc’: ../../include/par-res-kern_general.h:136:11: warning: implicit declaration of function ‘posix_memalign’ [-Wimplicit-function-declaration] ret = posix_memalign(&ptr,alignment,bytes); ^ amr.c: In function ‘AMPI_Main’: amr.c:842:14: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=] printf("ERROR: rank %d's BG work tile smaller than stencil radius: %d\n", ^ amr.c:1080:14: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has type ‘long int’ [-Wformat=] printf("ERROR: rank %d's work tile %d smaller than stencil radius: %d\n", ^ amr.c:1518:14: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=] printf("Rank %d about to call AMPI_Migrate in iter %d\n", my_ID, iter); ^ amr.c:1520:14: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=] printf("Rank %d called AMPI_Migrate in iter %d\n", my_ID, iter); ^ /opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1 -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c ../../common/MPI_bail_out.c In file included from ../../common/MPI_bail_out.c:51:0: ../../include/par-res-kern_general.h: In function ‘prk_malloc’: ../../include/par-res-kern_general.h:136:11: warning: implicit declaration of function ‘posix_memalign’ [-Wimplicit-function-declaration] ret = posix_memalign(&ptr,alignment,bytes); ^ /opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -O3 -std=c99 -DADAPTIVE_MPI -DRESTRICT_KEYWORD=0 -DVERBOSE=0 -DDOUBLE=1 -DRADIUS=2 -DSTAR=1 -DLOOPGEN=0 -DUSE_PUPER=1 -I../../include -c ../../common/wtime.c /opt/charm/charm-6.7.0/multicore-linux64/bin/ampicc -language ampi -o amr -O3 -std=c99 -DADAPTIVE_MPI amr.o MPI_bail_out.o wtime.o -lm -module CommonLBs cc1plus: warning: command line option ‘-std=c99’ is valid for C/ObjC but not for C++
rfvander@klondike:~/esg-prk-devel/AMPI/AMR$ /opt/charm/charm-6.7.0/bin/charmrun ./amr 20 1000 500 3 10 5 1 FINE_GRAIN +p 8 +vp 16 +balancer RotateLB +LBDebug 1 Running command: ./amr 20 1000 500 3 10 5 1 FINE_GRAIN +p 8 +vp 16 +balancer RotateLB +LBDebug 1
Charm++: standalone mode (not using charmrun) Charm++> Running in Multicore mode: 8 threads Converse/Charm++ Commit ID: v6.7.0-1-gca55e1d Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'. CharmLB> Verbose level 1, load balancing period: 0.5 seconds CharmLB> Load balancer assumes all CPUs are same. Charm++> Running on 1 unique compute nodes (16-way SMP). Charm++> cpu topology info is gathered in 0.001 seconds. [0] RotateLB created Parallel Research Kernels Version 2.17 MPI AMR stencil execution on 2D grid Number of ranks = 16 Background grid size = 1000 Radius of stencil = 2 Tiles in x/y-direction on BG = 4/4 Tiles in x/y-direction on ref 0 = 4/4 Tiles in x/y-direction on ref 1 = 4/4 Tiles in x/y-direction on ref 2 = 4/4 Tiles in x/y-direction on ref 3 = 4/4 Type of stencil = star Data type = double precision Compact representation of stencil loop body Number of iterations = 20 Load balancer = FINE_GRAIN Refinement rank spread = 16 Refinements: Background grid points = 500 Grid size = 3993 Refinement level = 3 Period = 10 Duration = 5 Sub-iterations = 1 Rank 12 about to call AMPI_Migrate in iter 0 Rank 12 entered dchunkpup Rank 7 about to call AMPI_Migrate in iter 0 Rank 7 entered dchunkpup Rank 8 about to call AMPI_Migrate in iter 0 Rank 8 entered dchunkpup Rank 4 about to call AMPI_Migrate in iter 0 Rank 4 entered dchunkpup Rank 15 about to call AMPI_Migrate in iter 0 Rank 15 entered dchunkpup Rank 11 about to call AMPI_Migrate in iter 0 Rank 11 entered dchunkpup Rank 3 about to call AMPI_Migrate in iter 0 Rank 1 about to call AMPI_Migrate in iter 0 Rank 1 entered dchunkpup Rank 3 entered dchunkpup Rank 13 about to call AMPI_Migrate in iter 0 Rank 13 entered dchunkpup Rank 6 about to call AMPI_Migrate in iter 0 Rank 6 entered dchunkpup Rank 0 about to call AMPI_Migrate in iter 0 Rank 0 entered dchunkpup Rank 9 about to call AMPI_Migrate in iter 0 Rank 9 entered dchunkpup Rank 5 about to call AMPI_Migrate in iter 0 Rank 5 entered dchunkpup Rank 2 about to call AMPI_Migrate in iter 0 Rank 2 entered dchunkpup Rank 10 about to call AMPI_Migrate in iter 0 Rank 10 entered dchunkpup Rank 14 about to call AMPI_Migrate in iter 0 Rank 14 entered dchunkpup
CharmLB> RotateLB: PE [0] step 0 starting at 0.507547 Memory: 990.820312 MB CharmLB> RotateLB: PE [0] strategy starting at 0.511685 CharmLB> RotateLB: PE [0] Memory: LBManager: 920 KB CentralLB: 19 KB CharmLB> RotateLB: PE [0] #Objects migrating: 16, LBMigrateMsg size: 0.00 MB CharmLB> RotateLB: PE [0] strategy finished at 0.511696 duration 0.000011 s Segmentation fault (core dumped)
|
- [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- <Possible follow-up(s)>
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- Message not available
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- Message not available
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/23/2016
- RE: [charm] Adaptive MPI, Van Der Wijngaart, Rob F, 11/24/2016
- Re: [charm] Adaptive MPI, Phil Miller, 11/25/2016
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
- Re: [charm] Adaptive MPI, Sam White, 11/23/2016
Archive powered by MHonArc 2.6.19.