[Wien] .machines for several nodes

Peter Blaha pblaha at theochem.tuwien.ac.at
Mon Oct 12 11:36:07 CEST 2020



On 10/12/20 10:24 AM, Christian Søndergaard Pedersen wrote:
> Thanks a lot for your answer. After re-reading the relevant pages in the 
> User Guide, I am still left with some questions. Specifically, I am 
> working with a system containing 96 atoms (as described in the 
> case.struct-file) and 224 inequivalent k points; i.e. 500 kpoints 
> distributed as a 7x8x8 grid (448 total) reduced to 224 kpoints. Running 
> on 4 nodes each with 16 cores, I want each of the 4 nodes to calculate 
> 56 k points (224/4 = 56). Meanwhile, each node should handle 24 atoms 
> (96/4 = 24).

lapw1/2 does not really parallelize over "atoms" but over APWs. For any 
single k-point (and matrix element ij you need summ over all atoms.

You want to distribute 54k-points to each of your nodes (therefore 4 
lines) and can probably use all cores of each node for each job (from 
your first .machines file I assumed you have 4 cores/node ??)

> 

> As for the parallellization over atoms for dstart and lapw0, I 
> understand that the numbers assigned to each individual node should sum 
> up to the number of atoms in the system, like this:
> 
> 
> dstart:g008:24 g021:24 g025:24 g028:24

Yes, this line would span 96 mpi processes. However, the main question 
is what kind of nodes you have. How many cores (real not virtual) does 
each node have)?  It does NOT make sense to overload a node heavily.

> so the final .machines-file would be a combination of the above pieces. 
> Have I understood this correctly, or am I missing the mark? Also, is 
> there any difference between distributing the k points across four jobs 
> (1 for each node), and across 224 jobs (by repeating each of the 1:gxxx 
> lines 56 times)?

In "principle yes", but in practice: NO WAY !!

a) Do not overload your nodes. Spanning more porcesses on a single node 
that what it has cores is not really beneficial in most cases.

b) Each parallelization has a certain overhead, and if you make a stupid 
parallelization, it can easily happen that your calculation runs 10 (or 
more) times SLOWER than in a less highly parallel (or even sequential) mode.
Even if you have 224 cores available, parallelization over 224 k-points 
would mean that all 224 jobs need a certain startup time and then try to 
read/write from your filesystem at the same time and this would most 
likely produce a tremendous overhead.

c) For an inexperienced user I'd suggest to
  i)) learn the details of your hardware (cores/node; filesystem (is 
there a local scratch ?), network-speed, ....)
  ii) start out with medium parallelization and monitor the timeing 
(case.dayfile, but also case.output1_*). In the "ideal" world, using 2 
cores instead of 1 should give a speedup of 2. If it does, increase the 
cores until you see a significant decrease of the speedup (but stop for 
sure before an INCREASE of run-time ("wall-time") occurs).
  iii) These considerations depend on the size of your calculations 
(large cases (a few hundred atoms) can run on 512 or more cores, our 
simple TiC "getting started" example only on 2-4 cores).
  iv) Reconsider your input: RKMAX adapted to your elements/sphere 
sizes) and k-points. I would for instance NEVER start a 96 atom cell 
with 224 k-points, but probably with ONE !! (insulator) or maybe 10-64 
(metal). Once scf (and force minimization !!!!) is reached, save_lapw 
and increase the k-mesh for checking convergence.


> 
> 
> Best regards
> 
> Christian
> 
> ------------------------------------------------------------------------
> *Fra:* Wien <wien-bounces at zeus.theochem.tuwien.ac.at> på vegne af Ruh, 
> Thomas <thomas.ruh at tuwien.ac.at>
> *Sendt:* 12. oktober 2020 09:29:37
> *Til:* A Mailing list for WIEN2k users
> *Emne:* Re: [Wien] .machines for several nodes
> 
> Hi,
> 
> 
> your .machines is wrong.
> 
> 
> The nodes for /lapw1 /are prefaced not with "lapw1:" but only with "1:". 
> /lapw2 /needs no line, as it takes the same nodes as lapw1 before.
> 
> 
> So an example for your usecase would be:
> 
> 
> #
> 
> dstart:g008:4 g021:4 g025:4 g028:4
> 
> lapw0:g008:4 g021:4 g025:4 g028:4
> 
> 1:g008:4 g021:4 g025:4 g028:4
> 
> granularity:1
> 
> extrafine:1
> 
> 
> The line starting with "1:" has to be repeated (with different nodes, of 
> course) x times, if you want to run x k-points in parallel (you can find 
> more details about this in the usersguide, pages 84-91).
> 
> 
> Regards,
> 
> Thomas
> 
> 
> PS: As a sidenote: Both /dstart /and /lapw0 /parallelize over atoms, so 
> 16 nodes might not be the best choice for your example.
> 
> ------------------------------------------------------------------------
> *Von:* Wien <wien-bounces at zeus.theochem.tuwien.ac.at> im Auftrag von 
> Christian Søndergaard Pedersen <chrsop at dtu.dk>
> *Gesendet:* Montag, 12. Oktober 2020 09:06
> *An:* wien at zeus.theochem.tuwien.ac.at
> *Betreff:* [Wien] .machines for several nodes
> 
> Hello everybody
> 
> 
> I am new to WIEN2k, and am struggling with parallellizing calculations 
> on our HPC cluster beyond what can be achieved using OMP. In particular, 
> I want to execute run_lapw and/or runsp_lapw running on four identical 
> nodes (16 cores each), parallellizing over k points (unless there's a 
> more efficient scheme). To achieve this, I try to mimic the example from 
> the User Guide (without the extra Alpha node), but my .machines-file 
> does not work the way I intended. This is what I have:
> 
> 
> #
> 
> dstart:g008:4 g021:4 g025:4 g028:4
> 
> lapw0:g008:4 g021:4 g025:4 g028:4
> 
> lapw1:g008:4 g021:4 g025:4 g028:4
> 
> lapw2:g008:4 g021:4 g025:4 g028:4
> 
> granularity:1
> 
> extrafine:1
> 
> 
> The node names gxxx are read from SLURM_JOB_NODELIST in the submit 
> script, and a couple of regular expressions generate the above lines. 
> Afterwards, my job script does the following:
> 
> 
> srun hostname -s > slurm.hosts
> run_lapw -p
> 
> which results in a job that idles for the entire walltime and finishes 
> with a CPU efficiency of 0.00%. I would appreciate any help in figuring 
> out where I've gone wrong.
> 
> 
> Best regards
> Christian
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------


More information about the Wien mailing list