Skip to Content.
Sympa Menu

charm - [charm] Fwd: Charm++ on Arcetri cluster

charm AT lists.siebelschool.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] Fwd: Charm++ on Arcetri cluster


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Charm Mailing List <charm AT cs.illinois.edu>
  • Subject: [charm] Fwd: Charm++ on Arcetri cluster
  • Date: Wed, 1 Apr 2015 16:38:02 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm/>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

For some reason, this message to our mailing list got discarded automatically:

---------- Forwarded message ----------
From: "Evgeniia Belousova -X (ebelouso - AAP3 INC at Cisco)" <ebelouso AT cisco.com>
To: "acun2 AT illinois.edu" <acun2 AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>
Cc: "Landon Noll (chongo)" <chongo AT cisco.com>, "Thomas Gilgan -X (thgilgan - AAP3 INC at Cisco)" <thgilgan AT cisco.com>

Hello all,

I have an issue with running a Charm++ app with charmrun, both in SMP and non-SMP modes, for example:

./charmrun ++p 2 my_app app_options
./charmrun +p2 my_app app_options
./charmrun my_app app_options

There is a nodelist file in the same directory containing the following information:

group main ++shell ssh
host pacini004
host pacini005
host pacini006

The program generates a 1D chare array with two elements:

CProxy_Array a = CProxy_Array::ckNew(arr_size); // Array class constructor has no arguments

The first chare in the array works fine, but the the other reports segmentation violation:

------------- Processor 1 Exiting: Caught Signal ------------
Reason: segmentation violation
Suggestion: Try running with '++debug', or linking with '-memory paranoid'
(memory paranoid requires '+netpoll' at runtime).
[1] Stack Traceback:
  [1:0]   [0x5361d4]
  [1:1]   [0x3f266326a0]
  [1:2] _ZN5Sieve5sieveEP11SieveReqMsg+0x182  [0x46dd92]
  [1:3] _ZN13CkIndex_Sieve23_call_sieve_SieveReqMsgEPvS0_+0x2b  [0x46ecd5]
  [1:4] CkDeliverMessageFree+0x28  [0x4a75b8]
  [1:5] _ZN14CkLocRec_local11invokeEntryEP12CkMigratablePvib+0x87  
[0x4c7a57]
  [1:6] _ZN14CkLocRec_local7deliverEP14CkArrayMessage11CkDeliver_ti+0x18f  
[0x4c8d7f]
  [1:7] _ZN8CkLocMgr7deliverEP9CkMessage11CkDeliver_ti+0x41a  [0x4c331a]
  [1:8] _Z15_processHandlerPvP11CkCoreState+0x493  [0x4ae543]
  [1:9] CsdScheduleForever+0x68  [0x53bb18]
  [1:10] CsdScheduler+0x2d  [0x53bc2d]
  [1:11] ConverseInit+0x34a  [0x5393fa]
  [1:12] main+0x27  [0x49df97]
  [1:13] __libc_start_main+0xfd  [0x3f2661ed5d]
  [1:14]   [0x46cba9]
Fatal error on PE 1> segmentation violation

What may cause such an issue? The program works in the standalone mode (./my_app app_options).

Besides, I’m trying to use charmrun with debugging options (./charmrun  ++ssh-display ++debug-no-pause), but it hangs waiting for the client to connect:

Charmrun> charmrun started...
Charmrun> using ./nodelist as nodesfile
Charmrun> adding client 0: "pacini004", IP:10.10.1.4
Charmrun> Charmrun = 173.36.252.226, port = 56750
start_nodes_rsh
Charmrun> Sending "0 173.36.252.226 56750 20113 0" to client 0.
Charmrun> find the node program "/home/ebelouso/testing/sieve/./sieve" at "/home/ebelouso/testing/sieve" for 0.
Charmrun> Starting ssh pacini004 -l ebelouso /bin/bash -f
Charmrun> remote shell (pacini004:0) started
Charmrun> node programs all started
Charmrun remote shell(pacini004.0)> remote responding...
Charmrun remote shell(pacini004.0)> using xterm /usr/bin/xterm
Charmrun remote shell(pacini004.0)> using debugger /usr/bin/gdb
Charmrun remote shell(pacini004.0)> starting node-program...
Charmrun remote shell(pacini004.0)> rsh phase successful.
Charmrun> Waiting for 0-th client to connect.

Are there any other options I should use for debugging?

Regards,

Evgeniia Belousova





  • [charm] Fwd: Charm++ on Arcetri cluster, Phil Miller, 04/01/2015

Archive powered by MHonArc 2.6.16.

Top of Page