Hello all,
I hope you don’t mind a few follow-up questions, after Ronak’s very useful input.
- If in a PE’s log there are BEGIN_PROCESSING and END_PROCESSING events for a certain entry method without intervening CREATION events, does that mean that the entry method does not send
explicit messages, but uses parameter marshalling for the communication? In this case, is it possible to determine how much time was spent on communication and how much on computation?
- I understand that in the CREATION records (with a “1” in the first field) the fifth field is the sequence number of the event on that PE. When I search for all creation records on a specific
PE, the fifth field is indeed monotonically increasing. However, some sequence numbers are missing. Are there other events (not CREATION) that make up for the missing sequence numbers? I did some quick search through the log, and it didn’t look like it.
- Is it correct to interpret the time stamps of corresponding CREATION and START_PROCESSING events on different PEs as the time a send operation returned on the source and the time the receive
operation returned on the target? If so, are the time stamps globally consistent, so that I can compare stamps on different PEs to determine how it took from send to receive a message? Also, if my interpretation is correct, there should be a 1-1 correspondence
between CREATION and START_PROCESSING events. Would there also be a matching END_PROCESSING event on the receiving PE in this case, and, if so, what would be the meaning of it?
- If I use SDAG methods in my code, I have essentially a very-long-running entry method, right (sorry, it has been a few years since I implemented Stencil in Charm++, and I don’t remember
all the right terminology)? Should I then expect many CREATION events (message sends) between the beginning and end of the main entry method?
Here are a few details I found out about the extremely simple
Stencil workload from the Parallel Research Kernels, which I ran like this:
./charmrun +p16 ./stencil 9 1000 4.This means 9 iterations (plus one not included in the timing but included in the trace), a grid of 1000x1000 points, and an overdecomposition factor of 4, so
a chare array of size 8x8, divided among 16 PEs. I scanned the traces for event type 1, 2, and 3, and recorded the most frequently occurring. The code is iterative, and I can see that some event counts are independent of the number of iterations, so not really
interesting. The top recurring ones are as follows for PE 0 (there are some differences among the PEs, no doubt caused by the fact that some chares do not have topological neighbors in some coordinate direction, because they work on a part of the grid adjacent
to the domain boundary):
Entry method
|
#creation events
|
#begin_processing events
|
#end_processing events
|
Stencil_begin_iteration
|
0
|
40
|
40
|
Stencil_compute
|
0
|
40
|
40
|
Stencil_processGhosts
|
0
|
110
|
110
|
SDAG_RTS
|
0
|
126
|
126
|
receiveGhosts(ghostMsg* impl_msg)
|
110
|
110
|
110
|
Does this make sense to you? As you can see, only receiveGhosts has creation events, and is has as many of those as it has begin_processing and end_processing events. Of the high-frequency methods, the only one I did not explicitly name
is SDAG_RTS. A typical trace record for the being_processing event related to SDAG_RTS is this:
2 65535 3 188894 -1 0 0 0 0 2 0 0 157140
For all records the datum after the time stamp is -1. I don’t know how to interpret that. Also, according to Ronak’s earlier message, the seventh item in this record signifies message length, but in all records of this type of SDAG_RTS
the seventh field is zero. Should I simply ignore these events as non-communicating?
Thanks for any help you can offer!
Rob
If I'm understanding what you're trying to do, I think you can do it all using the Projections logs themselves. There are CREATION events that represent message sends in the logs; these events are lines beginning with 1. The third field
on this line represents the entry ID of the message (that is to say, what method will be executed on the target), the fourth field the timestamp. The fifth and sixth fields are the event and PE, respectively. These can be matched against BEGIN_PROCESSING lines
to see where the triggering message came from.
For example, this CREATION line on PE 1:
1 19 135 303111 2 1 112 0
corresponds to this BEGIN_PROCESSING on PE 2:
2 4 135 303136 2 1 112 0 3 127966
You can see that the two fields after the timestamp match exactly (2 1). These mean that this creation was the second "event" on PE 1 and that the execution was triggered by the second "event" on PE 1, respectively.
This is the mechanism that Projections uses to determine dependencies. If you do a linear read through every log file, CREATION events that occur between BEGIN_PROCESSING and END_PROCESSING events are attributed to that entry method. We
use this to construct a map from every message in the application to its sending entry method and another map from every entry method to a list of the messages it sends. We can then use these to do both forward and backward dependency analysis.
To answer your other questions, I think logging events in _processHandler should work, but that will only give you receive events and might be too low level to easily capture all the information you want.
As far as not getting output, did you enable the ARRAY_DEBUG_OUTPUT flag? Doing so should give you some debug output.
On Wed, Jun 13, 2018 at 11:12 AM, Rob Van der Wijngaart <robv AT nvidia.com> wrote:
Hello Team,
I am guessing that Ronak is out of the office. Could someone else take a stab at answering my questions? Thanks much!
Rob
To follow up some more, I edited the block I mentioned as below, but when I run my code, which uses Chare arrays, I do not get any debug output. Is that to be expected? Thanks.
Rob
#if ARRAY_DEBUG_OUTPUT
# define DEB(x) /*CkPrintf x*/ //General debug messages
# define DEBI(x) /*CkPrintf x*/ //Index debug messages
# define DEBC(x) /*CkPrintf x*/ //Construction debug messages
# define DEBS(x) CkPrintf x //Send/recv/broadcast debug messages
# define DEBM(x) /*CkPrintf x*/ //Migration debug messages
# define DEBL(x) /*CkPrintf x*/ //Load balancing debug messages
# define DEBK(x) /*CkPrintf x*/ //Spring Cleaning debug messages
# define DEBB(x) /*CkPrintf x*/ //Broadcast debug messages
# define AA "ArrayBOC on %d: "
# define AB ,CkMyPe()
# define DEBUG(x) x
#else
# define DEB(X) /*CkPrintf x*/
# define DEBI(X) /*CkPrintf x*/
# define DEBC(X) /*CkPrintf x*/
# define DEBS(x) /*CkPrintf x*/
# define DEBM(X) /*CkPrintf x*/
# define DEBL(X) /*CkPrintf x*/
# define DEBK(x) /*CkPrintf x*/
# define DEBB(x) /*CkPrintf x*/
# define str(x) /**/
# define DEBUG(x)
#endif
Hi Ronak,
To follow up on yesterday’s message, I noticed at the top of ck-core/ckarray.C that it is possible to enable certain messages by editing the block shown below. I will experiment
with that now, but have a few questions:
-
DEBS(x) appears to control debug IO for send/recv/broadcast, but DEBB(x) also controls debug IO for broadcasts. Is the latter simply a subset of the former?
-
I don’t see a similar debug block in ck.C. Is it possible to obtain similar functionality, though?
Thanks!
Rob
#define ARRAY_DEBUG_OUTPUT 0
#if ARRAY_DEBUG_OUTPUT
# define
DEB(x) CkPrintf x //General debug messages
# define
DEBI(x) CkPrintf x //Index debug messages
# define
DEBC(x) CkPrintf x //Construction debug messages
# define
DEBS(x) CkPrintf x //Send/recv/broadcast debug messages
# define
DEBM(x) CkPrintf x //Migration debug messages
# define
DEBL(x) CkPrintf x //Load balancing debug messages
# define
DEBK(x) CkPrintf x //Spring Cleaning debug messages
# define
DEBB(x) CkPrintf x //Broadcast debug messages
# define AA "ArrayBOC on %d: "
# define AB ,CkMyPe()
# define
DEBUG(x) x
#else
# define
DEB(X) /*CkPrintf x*/
# define
DEBI(X) /*CkPrintf x*/
# define
DEBC(X) /*CkPrintf x*/
# define
DEBS(x) /*CkPrintf x*/
# define
DEBM(X) /*CkPrintf x*/
# define
DEBL(X) /*CkPrintf x*/
# define
DEBK(x) /*CkPrintf x*/
# define
DEBB(x) /*CkPrintf x*/
# define
str(x) /**/
# define
DEBUG(x)
#endif
Hi Ronak,
I’ve played around a bit with the trace, but it still does not give me quite what I need. I am trying to build a dependency graph, which requires, besides the time stamp and the
source of the event, also the target. I am looking at file src/ck-core/ck.C, trying to decide where to insert the requisite print statements. Will I capture all communication events by instrumenting the big case statement in function _processHandler? Thanks!
Rob
OK, thanks, Ronak. I’ll see how this works out once I’ve processed the graph info of my application.
Rob
From: Ronak Buch <rabuch2 AT illinois.edu>
Sent: Tuesday, June 5, 2018 4:05 PM
Yes, you can basically ignore the rest of the fields (they're some mixture of ordering, IDs, performance counters, etc.). The trailing zeros are some of these fields and can usually
be ignored.
dummy_thread_ep is basically a catchall for things that are traced that the runtime doesn't have enough information for to attribute to something else. It can be meaningful, but
it could be a lot of different things depending on what exactly your program is doing, so it's hard for me to make any conclusions without more information. Usually looking at the bookending events that have meaningful attributions is enough to figure out
what's going on in those sections.
On Tue, Jun 5, 2018 at 5:51 PM, Rob Van der Wijngaart <robv AT nvidia.com> wrote:
Thanks much, Ronak!
Below is a representative snippet from one of my log files.
The lines that start with 2 or 3 have 8 and 13 fields each. I guess I can ignore the fields that you’re not describing. A lot of the records that start with 2 or 3 refer to ENTRY
CHARE 0, which refers to dummy_thread_ep, according to the sts file (the line says: “ENTRY CHARE 0 dummy_thread_ep 0 0”). Is that a meaningful mapping, and do the trailing zeroes have any significance? Thanks.
Rob
3 0 355 14898768 569 16 0 11638933
2 4 0 14898768 2950 14 0 0 14 369 0 0 0
3 0 0 14898864 2950 14 0 11638933
14 14898866 14
15 14901654 14
2 5 355 14901657 893 16 100 0 0 0 0 0 11641931
1 4 0 14901667 2951 14 0 0
3 0 355 14901669 893 16 0 11641931
2 5 355 14901669 900 16 100 0 0 0 0 0 11641931
1 4 0 14901679 2952 14 0 0
3 0 355 14901681 900 16 0 11641931
2 4 0 14901681 2952 14 0 0 14 2118 0 0 0
3 0 0 14901772 2952 14 0 11641931
2 5 355 14901775 954 16 100 0 0 0 0 0 11641931
1 4 0 14901787 2953 14 0 0
3 0 355 14901787 954 16 0 11641931
2 4 0 14901789 2953 14 0 0 14 1958 0 0 0
3 0 0 14901878 2953 14 0 11641931
2 5 355 14901880 1019 16 100 0 0 0 0 0 11641931
1 4 0 14901890 2954 14 0 0
3 0 355 14901894 1019 16 0 11641931
2 5 355 14901895 1006 16 100 0 0 0 0 0 11641931
1 4 0 14901906 2955 14 0 0
3 0 355 14901906 1006 16 0 11641931
2 4 0 14901908 2955 14 0 0 14 1653 0 0 0
3 0 0 14901993 2955 14 0 11641931
2 4 0 14901995 2955 14 0 0 14 1349 0 0 0
3 0 0 14902085 2955 14 0 11641931
2 4 0 14902086 2955 14 0 0 14 1796 0 0 0
3 0 0 14902175 2955 14 0 11641931
14 14902176 14
15 14902230 14
From: Ronak Buch <rabuch2 AT illinois.edu>
Sent: Tuesday, June 5, 2018 3:09 PM
No, we don't have a good description of the record format right now other than the source code that generates it in Charm (src/ck-perf/trace-projections.C) and the source code that
parses it in Projections (GenericLogReader.java). I'll provide a brief summary below, but if there's something specific that you want, I can explain it.
In general, each line in the Projections log file represents a specific event. The ones that you're probably interested in are those beginning with 2 or 3, which represent BEGIN_PROCESSING
and END_PROCESSING events (the start and end of entry methods). The third field in those lines represents the entry ID, which can be converted a function name by using the sts file (in which ENTRY CHARE lines provide a mapping between an entry ID and the actual
name of the entry method). The fourth field represents the timestamp of the event (so the difference between the two values in this position for corresponding BEGIN_PROCESSING and END_PROCESSING events is the total time of the entry method). The sixth value
is the PE, which indicates which PE served as that event's source. The seventh field for BEGIN_PROCESSING events is the message length.
On Tue, Jun 5, 2018 at 4:38 PM, Rob Van der Wijngaart <robv AT nvidia.com> wrote:
OK, I got my .log, .sts, and .projrc files for my run! Next question, if you don’t mind. I know of Projections, but I’d like to use my own graph analysis tools to parse the log
files. Is there a description of the records in the .log and .sts files? Thanks!
Rob
Thanks, Ronak!
Rob
From: Ronak Buch <rabuch2 AT illinois.edu>
Sent: Tuesday, June 5, 2018 2:05 PM
To: Rob Van der Wijngaart <robv AT nvidia.com>
Cc: charm AT cs.illinois.edu
Subject: Re: [charm] Obtaining Charm++ profile
You built Charm++ with the correct flag ("--enable-tracing"). The issue is that the application, when linked, needs the "-tracemode projections" flag.
Specifically, if you're building NAMD (as you seem to be), you can just run "make projections," which includes the "-tracemode projections" flag already.
On Tue, Jun 5, 2018 at 12:47 PM, Rob Van der Wijngaart <robv AT nvidia.com> wrote:
Hello Team,
I am trying to obtain a profile of a NAMD run using Charm++. I built Charm++ this way:
./build charm++ multicore-linux64 gcc --with-production --enable-tracing -j40 -tracemode summary
According to the
man page I should now automatically obtain a profile if I run the application like this:
./namd-2.13-multicore-cuda +traceroot $PWD +p32 +setcpuaffinity +pemap 0-31 +devices 0,1,2,3 stmv/stmv.namd
However, I do not see any profile. If I then add +sumonly to the command line, I get this error message:
WARNING: +sumonly is a command line argument beginning with a '+' but was not parsed by the RTS.
If any of the above arguments were intended for the RTS you may need to recompile Charm++ with different options.
So it appears I did not build Charm++ properly to support tracing. Can you tell me how to do it right?
Second, I am actually interesting in doing projections, not obtaining a summary profile, but I do not understand this instruction in the manual:
Link time option: -tracemode
projections
After all, I only issue a build command to create the runtime, with no explicit link calls issued. Can you advise?
This concerns v6.8.2, which I obtained as a tar ball.
Thanks,
Rob
This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
|