charm AT lists.siebelschool.illinois.edu
Subject: Charm++ parallel programming system
List archive
- From: "Galvez Garcia, Juan Jose" <jjgalvez AT illinois.edu>
- To: Chris Wailes <chris.wailes AT gmail.com>
- Cc: charm <charm AT lists.cs.illinois.edu>
- Subject: RE: [charm] Incorrect T-Dimension Size Information
- Date: Mon, 26 Mar 2018 16:27:55 +0000
- Accept-language: en-US
Yeah, sorry, I mixed up the terms there. That is what I meant (one router, two nodes).
Seems like in your case CmiNumCores should return 64, but it is actually returning 16. The value comes from sysconf calls. You could try forcing number of cpu cores with FORCECPUCOUNT.
Btw,
does the number of PEs and nodes reported by Charm++ (in the output at the start of the application) match what you expect?
From: Chris Wailes [chris.wailes AT gmail.com]
Sent: Monday, March 26, 2018 11:17 AM
To: Galvez Garcia, Juan Jose
Cc: charm
Subject: Re: [charm] Incorrect T-Dimension Size Information
Sent: Monday, March 26, 2018 11:17 AM
To: Galvez Garcia, Juan Jose
Cc: charm
Subject: Re: [charm] Incorrect T-Dimension Size Information
Juan,
Thanks for the information.On Mon, Mar 26, 2018 at 12:07 PM, Galvez Garcia, Juan Jose
<jjgalvez AT illinois.edu> wrote:
Chris,
I don't fully understand the scenario you are dealing with, like where the 128 comes from. If there are 2 geminis per 3D coordinate, shouldn't T be at max 64?
In any case, I can give you some pointers in how the T value is calculated for XE6 and how you can change it.X, Y, Z coordinates in 3D torus are obtained via calls to the Cray rca library (code to get these values is in `src/util/topomanager/CrayNid.c`).
The T dimension is calculated in src/util/topomanager/XTTorus.h as `CmiNumCores() * CPU_FACTOR`CmiNumCores is defined in `src/ck-core/cputopology.C` and uses sysconf calls to determine the number of cores per host. Not sure exactly which sysconf calls are determining the value of cores in your case, but you should be able to find out. Also, you can force your own values using FORCECPUCOUNT environment variable.
CPU_FACTOR is set to 2 for XE6 in XTTorus.h. I assume the 2 comes from the fact that 2 geminis make one node in the 3D topology.
-Juan
From: Chris Wailes [chris.wailes AT gmail.com]
Sent: Tuesday, March 20, 2018 9:37 AM
To: charm
Subject: [charm] Incorrect T-Dimension Size Information
This seems to indicate that one of three things is happening:I am attempting to use Charm on a Cray XE6 machine with 16-Core AMD Abu Dhabi chips. The way this machine is set up the job management system treats a single CPU as a node with 32 processing elements (16 physical cores / 32 logical cores).I've been able to run programs from the test/ and examples/ directories using core counts from 1 to 128 (across 4 of the job manager's nodes). Unfortunately the size of the T dimension as reported by the TopoManager is always 32, instead of the correct value of 128.
- The part of Charm++ responsible for assigning jobs has the correct size of the T-Dimension that it uses, and there is simply a discrepancy between that value and the value reported from the TopoManager.
- The part of Charm++ responsible for assigning jobs also believes that the T-Dimension is only 32, and as a result work is only being allocated to the first 32 processing elements connected to the router. Everything works fine, but only a quarter of the available resources are being used.
- Different parts of the Charm++ runtime have different ideas of what the T-Dimension size is. Given a chance, the runtime might try and assign a Char to a PE with a T-coordinate >= 32 (assuming 0 indexing) causing a runtime error/exception but I have been lucky enough not to encounter this yet.
My questions then are: which of these three scenarios are occurring and how do I get the TopoManager to report the correct size for the T dimension?
- Chris
- [charm] Incorrect T-Dimension Size Information, Chris Wailes, 03/20/2018
- RE: [charm] Incorrect T-Dimension Size Information, Galvez Garcia, Juan Jose, 03/26/2018
- Re: [charm] Incorrect T-Dimension Size Information, Chris Wailes, 03/26/2018
- RE: [charm] Incorrect T-Dimension Size Information, Galvez Garcia, Juan Jose, 03/26/2018
- Re: [charm] Incorrect T-Dimension Size Information, Chris Wailes, 03/30/2018
- RE: [charm] Incorrect T-Dimension Size Information, Galvez Garcia, Juan Jose, 03/26/2018
- Re: [charm] Incorrect T-Dimension Size Information, Chris Wailes, 03/26/2018
- RE: [charm] Incorrect T-Dimension Size Information, Galvez Garcia, Juan Jose, 03/26/2018
Archive powered by MHonArc 2.6.19.